Measurement of nucleic acid variants using highly-multiplexed error-suppressed deep sequencing

ABSTRACT

Methods and compositions are disclosed for measuring low-abundance DNA variants from a complex mixture of DNA molecules. Embodiments of the methods allow for extremely sensitive detection and can distinguish true variants from sequencer misreads and PCR misincorporations.

GOVERNMENTAL INTERESTS

The research leading to this application was funded by the NationalInstitutes of Health from grant RR014139. The government has certainrights in this invention.

BACKGROUND

Tumor-derived DNA is released into the bloodstream from dying cancercells in patients with various types of malignancies. Such circulatingtumor DNA (ctDNA) is showing excellent promise as a non-invasive cancerbiomarker. However, an assay that is capable of exploiting ctDNA forearly cancer detection presents several challenges. In the bloodstream,ctDNA can be distinguished from normal background DNA based on thepresence of tumor-specific mutations. However, mutant ctDNA is usuallyonly present in small amounts, having been previously reported tocomprise an average of 0.2% of total plasma DNA (Diehl et al., Nat Med.2008; 14: 985-990). If variant DNA sequences are low in abundance,detecting and quantifying these variants can be more challenging. Smallamounts of mutation-harboring ctDNA can be obscured by a relative excessof background wild-type plasma DNA. Thus, an assay with extremely high25 detection sensitivity is required.

There is a need for a method that is able to detect and quantify rarevariant sequences to detect cancers in situations where the amount ofDNA in a given sample is limited. Unlike existing approaches, a testshould be able to evaluate an entire panel of mutation-prone regionswithout needing to divide DNA samples into separate reactions (whichcould reduce detection sensitivity by providing fewer template DNAcopies per reaction). Methods and compositions are described herein thatprovide a multiplex assay to detect minute amounts of ctDNA and addressthe current deficiencies to assay ctDNA.

SUMMARY

Described herein are compositions and methods relating tonext-generation sequencing and medical diagnostics. Methods includeidentifying and quantifying nucleic acid variants, particularly thoseavailable in low abundance or those obscured by an abundance ofwild-type sequences. Also described herein are methods related toidentifying and quantifying specific sequences from a plurality ofsequences amid a plurality of samples. Methods as described herein alsoinclude detecting and distinguishing true nucleic acid variants frommisincorporation errors, sequencer errors, and sample misclassificationerrors. Methods include early attachment of barcodes and molecularlineage tags (MLTs) to targeted nucleic acids within a sample. Methodsalso include clonal overlapping paired-end sequencing to achievesequence redundancy.

In an embodiment, a method includes measuring nucleic acid variants bytagging and amplifying low abundance template nucleic acids in amultiplexed primer extension or PCR. Low abundance template nucleicacids may be fetal DNA, circulating tumor DNA (ctDNA), viral RNA, viralDNA, DNA from a rejected transplanted organ, or bacterial DNA. Amultiplex PCR may include gene specific primers, wherein primers arespecific for a mutation prone region (e.g., within KRAS, EGFR, etc.). Inan embodiment, a mutation prone region may be associated with cancer. Asdisclosed herein, a multiplex PCR can include more than one round of PCRand/or primer-extension. In an embodiment, a multiplex PCR can includetwo or three rounds of PCR.

In an embodiment, primers comprise a barcode and/or a molecular lineagetag (MLT). In an embodiment, a MLT can be 2-10 nucleotides. In anotherembodiment, a MLT can be 6, 7, or 8 nucleotides. In an embodiment, abarcode can identify the sample of template nucleic acid. In anembodiment, a PCR reaction mixture includes template nucleic acids frommultiple samples (e.g., patients), wherein the barcode identifies thesample origin of the template nucleic acid. In an embodiment, a primerextension reaction employs targeted early barcoding. In targeted earlybarcoding, a plurality of different primers specific for differentnucleic acid regions all have an identical barcode. An identical barcodeidentifies the nucleic acids from a particular sample. In an embodiment,primers used for targeted early barcoding are produced by combining aunique barcode-containing oligonucleotide segment with a uniform mixtureof gene-specific primer segments in a modular fashion.

In an embodiment, multiplex assays described herein can be used forclinical purposes. In an embodiment, nucleic acid variants within bloodcan be identified and measured before and after treatment. In an exampleof cancer, a nucleic acid variant (e.g., cancer-related mutation) can beidentified and/or measured prior to treatment (e.g., chemotherapy,radiation therapy, surgery, biologic therapy, combinations thereof).Then after treatment, the same nucleic acid variant can be identified ormeasured. After treatment, a decrease or absence of the nucleic acidvariant can indicate that the therapy was successful.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic of a copying, tagging, amplification, andsequencing process. Two rounds of limited-cycle PCR were performed toattach barcode sequences and molecular lineage tag sequences to copiesof targeted template DNA fragments. Stringent purification of the PCRproducts was carried out between rounds in order to remove unextendedprimers, spurious extension products, and template DNA. A final round ofPCR was performed using universal primers to further amplify thepurified products from Round 2. Final amplification products weregel-purified and subjected to clonal overlapping paired-end massivelyparallel sequencing. Use of primers synthesized from modular segmentsallowed early barcoding of targeted template DNA during the first roundof PCR, enabling subsequent steps to be performed in a combined reactionvolume.

FIG. 2 illustrates a general approach to combining modularoligonucleotide segments to produce mixtures of gene-specific barcodedprimers. Primers produced by combining modular segments allowprimer-extension and early barcoding of multiple targeted gene regionsin a given sample. A primer mix used for a particular sample may have aunique barcode and multiple gene-specific primer sequences. For adifferent sample, the mix can have the same set of gene-specificsequences in identical ratios, but a different barcode.

FIG. 3 illustrates a method for producing combinations of modularoligonucleotide segments using an automated oligonucleotide synthesizer.First, gene-specific 3′-segments of oligonucleotides were synthesized onsolid supports on separate synthesis columns. The oligonucleotides weresynthesized in a 3′ to 5′ direction. The synthesis was then paused, andthe partially-synthesized oligonucleotides were left in a protectedstate on solid support particles. The contents of all columns wereevenly mixed, and the mixture of solid support particles was thendispensed into separate fresh columns. Synthesis of thebarcode-containing 5′-segment of the oligonucleotides was then continuedin the new columns. A uniquely barcoded 5′-segment was added in eachcolumn. After cleavage, deprotection and purification, the resultingbarcoded oligonucleotide mixtures all had identical ratios of3′-segments.

FIG. 4A and FIG. 4B are schematics of error-suppressed multiplexed deepsequencing. FIG. 4A shows cell-free DNA purified from plasma undergoingtwo rounds of amplification by PCR. The first round amplifies mutationhotspot regions of several genes from a given sample in a single tube.The second round separately amplifies each hotspot region using nestedprimers incorporating unique combinations of barcodes to label distinctsamples. The barcoded PCR products are then pooled and subjected to deepsequencing. Millions of sequences are sorted and counted to determinethe ratio of mutant to wild-type molecules derived from each sample. Thetotal number of plasma DNA fragments is measured by real-time PCR andcan be used to calculate the absolute concentration of mutant ctDNA.FIG. 4B shows sequence redundancy in mutation hotspot regions isproduced by partial overlap of paired-end reads from the forward andreverse strands of each clone. This yields highly accurate base-calls,permitting detection and quantitation of rare mutations with greatersensitivity.

FIGS. 5A-C are graphs depicting suppression of spurious mutation countsto reveal low-abundance variants. Each bar indicates the frequency of aparticular deviation from the wild-type sequence occurring within thecodon 12/13 hotspot region of KRAS. The tested sample contains 0.2% DNAderived from a lung cancer cell line that is known to be homozygous fora KRAS Gly12Ser mutation. FIG. 5A shows filtered reads from one end ofthe amplicon have relatively frequent mismatches when directly comparedto the wild-type sequence. Data from 3 replicate amplifications areshown. FIG. 5B indicates sequencer errors are greatly reduced byrequiring both partially overlapping paired-end reads from each clone toexactly match a specific mutation. The Gly12Ser mutation is now readilydistinguished from the remaining low-level errors that were likelyintroduced during DNA amplification and processing. Insertions anddeletions are no longer seen in this region after requiring agreement ofoverlapped reads. FIG. 5C shows a further reduction in the relativeerror level can be achieved by calculating the mean values of 3replicate measurements, since mutations found in the original DNA sampleshould produce more consistent counts than randomly occurring errors.

FIGS. 6A-C indicates the performance of error-suppressed deepsequencing. Measurements of DNA extracted from mutant and wild-typecancer cell lines mixed in various ratios ranging from 1:10,000 to10,000:1 show a high degree of accuracy and reproducibility. FIG. 6A isa linear plot of DNA from the KRAS-mutant cell line over the range ofconcentrations tested. FIG. 6B and FIG. 6C show that BRAF- andEGFR-mutant lines, respectively, contained a small amount of wild-typeDNA, thereby yielding a plateau at higher mutant to wild-type ratios.Non-linear least-squares fits were performed using the equationy=10̂(slope*log((1−C)*x/(C*x+1))+intercept) where C was the fraction ofwild-type molecules found in DNA extracted from mutant cell lines. Errorbars indicate the standard deviation of 3 measurements.

FIGS. 7A-C show the changes in ctDNA levels with treatment or diseaseprogression. Measurements of mutant ctDNA from patients with NSCLC areshown at various times in relation to therapeutic interventions anddisease status. ctDNA was considered undetectable if sequence countsyielded a quantity of less than one mutant molecule per sample. Mediangenome equivalents per sample, as determined by real-time PCR were 9602(IQR=5412-11513) FIG. 7A shows the timeline of treatment for Patient 3who had stage IV lung adenocarcinoma with a 4.3 cm right upper lobetumor and large metastases in the abdomen and supraclavicular region.She was treated concurrently with an experimental histone deacetylase(HDAC) inhibitor and palliative radiation therapy directed at herpainful 6.9 cm supraclavicular lesion. She began chemotherapy treatmentshortly afterwards. FIG. 7B shows the timeline of treatment for Patient5 who had a 7.5 cm lung adenocarcinoma with eight small brain metastasesranging from 3 mm to 15 mm in size at presentation. He was treated withpalliative whole-brain radiation therapy, followed by long-term weeklychemotherapy. Follow-up imaging revealed an excellent, durable responsewith shrinkage of the lung tumor to ˜15% of its original volume at 7months after diagnosis. No evidence of disease progression was seenduring this time period. FIG. 7C shows the timeline of treatment forPatient 9 who underwent definitive radiation treatment for locallyadvanced, stage IIIB undifferentiated NSCLC. Other health conditionsprevented him from undergoing surgery or concurrent chemotherapy. Bloodsample collection commenced upon completion of his treatment. Althoughhis disease was confined to the thorax prior to initiating radiationtherapy, a PET scan performed 8 weeks after treatment showed markedprogression of disease with multiple osseous, hepatic, and subcutaneousmetastases. He expired 10 weeks after completing treatment.

FIGS. 8A-C show the ctDNA levels in patients with fewer than 3time-points. FIG. 8A shows the timeline of treatment for Patient 11 whohad stage IV lung adenocarcinoma with widespread metastatic disease inthe bones of her spine, ribs, sternum, clavicle, humerus, and pelvis.She was treated with a short course of palliative radiation therapy fora pathologic fracture in her lumbar spine. A single blood sample wasobtained on her last day of treatment. She passed away approximately 1week after completion of therapy. FIG. 8B shows the timeline oftreatment for Patient 14 who had stage IV lung adenocarcinoma. Hereceived palliative radiation therapy to a painful 8.9×5.9×4.9 cm lesionin his left posterior chest wall, given concurrently with anexperimental histone deacetylase (HDAC) inhibitor. He had additionalmetastatic lesions in his liver, kidneys, and peri-splenic region. Hewas hospitalized for profound weakness 10 days post-treatment, andexpired shortly afterwards. FIG. 8C shows the timeline of treatment forPatient 15 who had stage IV undifferentiated NSCLC with metastasis inthe supraclavicular and inguinal regions, as well as several smalltumors in his brain. His brain lesions were treated with single-fractionstereotactic radiosurgery. He then began palliative radiation therapyfor a painful 7.1 cm left upper lobe lung mass, which was threatening toobstruct his left mainstem bronchus. He received concurrent treatmentwith a HDAC inhibitor as part of a clinical trial. He passed awayunexpectedly after receiving 8 of 10 planned radiation treatments.

FIG. 9 shows a scheme for appending modular barcodes to gene-specificprimers. An ability to combine barcodes and gene-specific primers in amodular fashion provides flexibility to modify or expand a panel ofgenes or number of samples being tested. A gene-specific primer wasadded to the 3′-end of a barcoded oligonucleotide by polymerization on abiotinylated template. A biotin tag was used to capture adouble-stranded product onto streptavidin resin. A barcoded primer wasthen released into solution by heat-denaturation. In a similar manner, amixture of biotinylated templates can be used to produce a mixture ofgene specific primers, all having the same barcode. Separate reactionsuse different barcoded oligonucleotides to produce uniquely barcodedprimer mixes that can be used for targeted early barcoding.

FIG. 10 is a schematic of a process described in Example 2. First, aprimer extension step was carried out using primers that assignsample-specific barcodes and MLT sequences to copied template DNA. For agiven sample, multiple targeted sequences were copied using multiplegene-specific primers, all bearing the same sample-specific barcode.After stringent purification of specifically extended products, apre-amplification step was performed in order to produce many copies ofthe tagged molecules. This allowed splitting of the products intodifferent tubes for separate amplification of each target site, whileensuring that copies of the original templates are adequately sampled.The products of the final PCRs were combined and subjected to clonaloverlapping paired-end deep sequencing. Nested primers were used toenhance target specificity at each step.

FIG. 11 shows a workflow of a process described in Example 2. Separateprimer-extension reactions were initially carried out for each sample.Barcoded products were then be mixed into a single volume forpurification and pre-amplification steps. Purified products were thensplit into separate tubes and underwent final single-target PCR inseparate reaction volumes.

FIG. 12 shows an example of a Round 1 reverse primer sequence,highlighting various elements of the sequence. Note that thegene-specific sequence at the 3′-end can act as a primer for either PCRor primer-extension by a DNA polymerase. The 5′-segment contains asample-specific barcode sequence, a Molecular Lineage Tag (MLT), as wellas adapter sequences required by the next-generation sequencingplatform. In this example, a gene-specific segment is specific for amutation prone-region of TP53. For Round 1 PCR or primer-extension of agiven sample, a mixture of several reverse primers would be used, allhaving the same sample-specific barcode sequence, and multiple differentgene-specific sequences. A similar mixture, but with another barcode,would be used for Round 1 PCR or primer-extension of a different sample.

FIG. 13 is a schematic of a process of splint-mediated ligation ofmodular oligonucleotide segments. A 5′-segment containing a particularbarcode sequence can be ligated to a mixture of 3′-segments having avariety of gene-specific primer sequences using a biotinylated splintoligonucleotide. Hybridization to the splint oligonucleotide is mediatedby common annealing sequences. A 5′-phosphate is necessary on the3′-segments to permit enzymatic ligation. The biotinylated splint can beused to capture and wash and elute the ligated products.

FIG. 14 shows elements of a sequence output using the Illumina®platform. Read 1 and read 3 are from opposite strands, and providesequence redundancy via overlap in the mutation-prone region. Thisclonal redundancy allowed sequences resulting from sequencer errors tobe identified and discarded, permitting greater sensitivity fordetection of rare sequence variants.

FIG. 15 shows hypothetical processing of data from sequences assigned toa single gene and a single barcode. The example illustrates how analysisof variant sequences and associated molecular lineage tags can beperformed.

FIG. 16 shows processed data from sequences generated using methodsdescribed in Example 2. Data are shown for a single gene and a singlebarcode. Symbols used in the mismatch table are as defined in FIG. 15.MLT counts associated with variant sequences are displayed in the format“N×Z” where N is the number of copies of a particular MLT sequence, andZ is the number of different MLT sequences having N copies.

FIG. 17 shows an ethidium bromide-stained 2% agarose gel containingproducts of Round 3 PCR. A marker lane contained a 100 base-pair ladderfor size comparison. The gel shows a diffuse band containing amplifiedproducts within the expected size range, and very little spuriousproduct migrating at a different size.

FIG. 18 shows processed data from sequences generated from the methodsdescribed in Example 3. Data are shown for a single gene and a singlebarcode. Results are displayed in a format similar to FIG. 15, but inthis case analysis of two separate MLT sequence regions (MLT-1 andMLT-2) was performed for variant and wild-type sequences. To reportthese counts in a succinct format, MLT counts are binned by powers oftwo. For example, an MLT-1 count of 13 would be placed into bin 4(because 2̂4 is the smallest power of 2 that is greater than or equal to13). Thus, a report of 4×5 means that there were five instances ofcounts in the range of 9 to 16. Similarly, a report of 3×6 means thatthere were six instances of counts in the range of 5 to 8. For a givencollection of MLT-1 counts, the associated MLT-2 counts were reported ina similar format, to the right of the MLT-1 counts and separated bycolons. For example, 4×5:2×3:1×7 meant that among 5 sets of MLT-1sequences occurring between 9 and 16 times, there were 3 instances ofMLT-2 sequences that occurred between 3 and 4 times, and 7 instances ofMLT-2 sequences that occurred twice. Different MLT-1 bins were separatedby a space.

DETAILED DESCRIPTION Definitions

The terms “nucleic acid,” “nucleotide,” “polynucleotide,” and“oligonucleotide” are used interchangeably. They refer to a polymericform of nucleotides of any length, either deoxyribonucleotides orribonucleotides, or analogs thereof. Polynucleotides may have anythree-dimensional structure, and may perform any function, known orunknown. The following are non-limiting examples of polynucleotides:coding or non-coding regions of a gene or gene fragment, loci (locus)defined from linkage analysis, exons, introns, messenger RNA (mRNA),transfer RNA, ribosomal RNA, ribozymes, cDNA, recombinantpolynucleotides, branched polynucleotides, plasmids, vectors, isolatedDNA of any sequence, isolated RNA of any sequence, nucleic acid probes,and primers. A polynucleotide may comprise modified nucleotides, such asmethylated nucleotides and nucleotide analogs. If present, modificationsto the nucleotide structure may be imparted before or after assembly ofthe polymer. The sequence of nucleotides may be interrupted bynon-nucleotide components. A polynucleotide may be further modifiedafter polymerization, such as by conjugation with a labeling component.

The term “base”, in its singular form, refers to a single residue withina nucleic acid molecule or to a single position within a nucleic acidsequence read.

The term “biological sample” refers to a body sample from any animal,but preferably is from a mammal, more preferably from a human. Suchsamples include biological fluids such as serum, plasma, vitreous fluid,lymph fluid, synovial fluid, follicular fluid, seminal fluid, amnioticfluid, milk, whole blood, urine, cerebro-spinal fluid, saliva, sputum,tears, perspiration, mucus, and tissue culture medium, as well as tissueextracts such as homogenized tissue, and cellular extracts.

As used herein, “buffer” refers to a buffered solution that resistschanges in pH by the action of its acid-base conjugate components.Buffers may optionally comprise a salt such as MgCl₂, MnCl₂, or thelike. Buffers may also optionally comprise other constituents to improvethe efficiency of reverse transcription or amplification, including, butnot limited to, betaine, dimethyl sulfoxide, surfactant, bovine serumalbumin, etc.

The term “cDNA” refers to a complementary DNA molecule synthesized usinga ribonucleic acid strand (RNA) as a template. RNA may be mRNA, tRNA,rRNA, microRNA, or another form of RNA, such as viral RNA. The cDNA maybe single-stranded, double-stranded or may be hydrogen-bonded to acomplementary RNA molecule as in an RNA/cDNA hybrid.

The term “polymerase chain reaction” or “PCR” refers to a procedure ortechnique in which minute amounts of nucleic acid, RNA and/or DNA, areamplified as described in U.S. Pat. No. 4,683,195 issued Jul. 28, 1987.Generally, sequence information from the ends of the region of interestor beyond needs to be available, such that oligonucleotide primers canbe designed; these primers will be identical or similar in sequence toopposite strands of the template to be amplified. The 5′ terminalnucleotides of the two primers may coincide with the ends of theamplified material. PCR can be used to amplify specific RNA sequences,specific DNA sequences from total genomic DNA, and cDNA transcribed fromtotal cellular RNA, bacteriophage or plasmid sequences, etc. Seegenerally Mullis et al., Cold Spring Harbor Symp. Quant. Biol., 51:263(1987); Erlich, ed., PCR Technology, (Stockton Press, N Y, 1989).

The term “reverse transcription polymerase chain reaction” or “RT-PCR”refers to the transcription of cDNA from a RNA template by the enzymereverse transcriptase. The cDNA is then amplified by known PCR methods.

The term “primer-extension” refers to an enzymatic process whereby aprimer is hybridized to a template nucleic acid strand and ispolymerized using said strand as a template. Polymerization can bemediated by enzyme classes including but not limited to DNA polymerasesor reverse transcriptases. Primer-extension can take place as anisolated reaction (single extension of a primer on a template), or aspart of a repetitive process such as PCR.

The term “primer” refers to an oligonucleotide capable of acting as apoint of initiation of synthesis along a complementary strand whenconditions are suitable for synthesis of a primer extension product. Thesynthesizing conditions include the presence of four differentdeoxyribonucleotide triphosphates (dNTPs) and at least onepolymerization-inducing agent such as reverse transcriptase or DNApolymerase. These are present in a suitable buffer, which may includeconstituents which are co-factors or which affect conditions such as pHand the like at various suitable temperatures. A primer is preferably asingle strand sequence, such that amplification efficiency is optimized,but double stranded sequences can be utilized. A primer can have somesequences that are not designed to hybridize to the targeted templateDNA, including sequences at the 5′-end of the primer that becomesincorporated into the amplified products. Such sequences can includeuniversal primer binding sites to be used in subsequent amplifications,sample-specific barcodes, or molecular lineage tags. In addition toserving the purpose of copying a nucleic acid template, a primer canalso be used to append labels or other sequences to the copied products.Primers and other synthetic oligonucleotides disclosed herein haveundergone either polyacrylamide gel purification or reverse-phasecartridge purification unless otherwise specified. A primer can also bemodified by attachment of one or more chemical moieties including butnot limited to biotin, a fluorescent tag, a phosphate, or a chemicallyreactive group.

The term “gene-specific primer” refers to a primer that is designed tohybridize to and be extended on a particular nucleic acid target. The3′-segment of a gene-specific primer is complementary to its targetedRNA or DNA sequence, but other portions of the primer need not becomplementary to any target. The target need not be a “gene” in thestrict sense of the word. Possible targets include but are not limitedto genomic DNA, mitochondrial DNA, viral DNA, mRNA, microRNA, viral RNA,tRNA, rRNA, and cDNA.

The term “nested primer” refers to a primer that is designed tohybridize to a primer-extended or PCR amplified product at a positionthat is either entirely or partially within the target region that wasflanked by the original primers. The 3′-end of a nested primer iscomplementary to target sequences that would not have been containedwithin the original primers, but rather would have been copied byextension of the original primers on the desired template. Nestedprimers thus provide additional specificity for copying or amplifying adesired target after an initial round of primer-extension or PCR.

The terms “reaction mixture” or “PCR reaction mixture” or “PCR mastermix” refer to an aqueous solution of constituents in a PCR or RT-PCRreaction that can be constant across different reactions. An exemplaryPCR reaction mixture includes buffer, a mixture of deoxyribonucleosidetriphosphates, reverse transcriptase, primers, probes, and DNApolymerase. Generally, template DNA is the variable in a PCR reaction.

The terms “sequence variant” or “mutation” are used interchangeably andrefer to any variation in a nucleic acid sequence including but notlimited to single point-mutations, multiple point-mutations,insertions/deletions (indels), and single-nucleotide polymorphisms(SNPs). These terms are used interchangeably in this document, and it isunderstood that when reference is made to a method for evaluating onetype of variant, it could be equally applied to evaluation of any othertype of variant. The term “variant” can also be used to refer to asingle molecule whose sequence deviates from a reference sequence, or acollection of molecules whose sequences all deviate from the referencesequence in the same way. Similarly, “variant” can refer to a singlesequence (or read) that deviates from a reference sequence or a set ofsequences that deviate from a reference sequence.

The terms “mutation-prone region” and “mutation hotspot” are usedinterchangeably, and refer to a sequence region of a nucleic acidobtained from a biological source that has a higher probability of beingmutated than surrounding sequence regions within the same nucleic acid.In the case of tumor-derived DNA, mutation-prone regions can be found incertain cancer-related genes. The mutation-prone region can be of anylength, but mutation-prone regions that are analyzed using the methodsdisclosed herein are less than 100 nucleotides long. A mutation can befound anywhere within a mutation-prone region.

The term “target region” refers to a region of a nucleic acid that istargeted for primer extension or PCR amplification by specifichybridization of complementary primers.

The term “clonal overlapping paired-end sequencing” refers to amassively parallel sequencing method in which paired-end reads areobtained for each clonal sequence such that portions of the two readsfrom opposite strands are able to cover the same region of DNA. Thisapproach is used to reduce or suppress or distinguish sequencer-derivederrors, thereby allowing base-calls to be made with greater confidence.The region of DNA that is covered by the overlapping reads iseffectively read twice in opposite directions, once from each strand ofthe duplex. Thus, by including the mutation-prone region within the areaof sequence overlap, the mutation prone region is read in one directionand then proofread in the opposite direction. Read-pairs that do nothave perfect sequence consistency in the overlapping region (afterobtaining a reverse-complement of one of the reads) can be attributed tosequencer error and can be discarded from the analysis. This approachgreatly reduces the background of sequencer-generated errors and allowsrare mutant molecules to be detected with greater sensitivity.

The terms “barcode”, “tag”, and “index” are used interchangeably andrefer to a sequence of bases at certain positions within anoligonucleotide that is used to identify a nucleic acid molecule asbelonging to a particular group. A barcode is often used to identifymolecules belonging to a certain sample when molecules from severalsamples are combined for processing or sequencing in a multiplexedfashion. A barcode can be any length, but is usually between 6 and 12bases long (need not be consecutive bases). Barcodes are usuallyartificial sequences that are chosen to produce a barcode set, such thateach member of the set can be reliably distinguished from every othermember of the set. Various strategies have been used to produce barcodesets. One strategy is to design each barcode so that it differs fromevery other barcode in the set at a minimum of 2 distinct positions.

The term “sample-specific barcode” refers to a barcode sequence that isassigned to molecules that are derived from a particular sample.

The term “template nucleic acid” refers to any nucleic acids that canserve as targets for primer-extension, reverse-transcription, or PCR. Atemplate nucleic acid can be DNA or RNA. Methods described herein foranalysis of DNA can also be applied to the analysis of RNA afterreverse-transcribing the RNA to produce cDNA. Methods for evaluating DNAcan be equally applied to the evaluation of RNA.

The terms “deep sequencing” and “ultra-deep sequencing” are usedinterchangeably herein and refer to approaches that use massivelyparallel sequencing technologies to obtain large numbers of sequencescorresponding to relatively short, targeted regions of the genome. Atargeted region can include, for example, an entire gene or smallsegment of a gene (such as a mutation hotspot). In some cases, manythousands of clonal sequences are obtained from a short targeted segmentallowing identification and quantitation of sequence variants.

The term “clonal sequence” refers to a sequence that is derived from asingle molecule within a sample that is subjected to massively parallelsequencing. Specifically, each clonal sequence that is generated bymassively parallel sequencing is derived from a distinct DNA moleculewithin a sample that serves as the “input” for the sequencing workflow.

The terms “targeted early barcoding”, “early barcoding”, “attachment ofearly barcodes”, and “assignment of early barcodes” are usedinterchangeably and refer to assignment of barcodes to selected nucleicacid targets within a sample by specific hybridization andpolymerization of barcode-containing primers at an early processingstep. Preferably, barcode assignment occurs during the first enzymaticstep that is performed after template nucleic acid molecules arepurified from a biological sample. This first enzymatic step can beprimer-extension, reverse-transcription, or PCR. When multiple differenttarget sequences are to be tagged and copied from a given sample, amixture of several different target-specific primers are used in asingle reaction volume, with every primer in the mixture having the samesample-specific barcode. Separate early barcoding reactions are carriedout for each sample, using similar mixtures of primers bearing distinctbarcodes for each sample. Targeted early barcoding allows molecules fromdifferent samples to be combined into a single volume for all subsequentprocessing steps.

The term “degenerate sequence” refers to a stretch of sequence in which,within a population of nucleic acid molecules, two or more differentbases can be found at each position. Most often, degenerate sequencesare produced such that there is an approximately equal probability ofeach position having A, C, G, or T (in the case of DNA), or having A, C,G, or U (in the case of RNA). However, in certain situations, differentbases can be incorporated in varying ratios at different positions, andsome bases can be omitted at certain positions if desired. A degeneratesequence can be of any length.

The terms “molecular lineage tag”, “MLT”, and “MLT sequence” are usedinterchangeably and refer to a stretch of degenerate sequence that iscontained within a synthetic oligonucleotide (e.g. a primer) and is usedto assign a set of diverse sequence tags to copies of template nucleicacid molecules. A molecular lineage tag is designed to have between 2and 10 degenerate base positions, but preferably has between 6 and 8base positions. The bases need not be consecutive, and can be separatedby constant sequences. The number of possible MLT sequences that can begenerated in a population of oligonucleotide molecules is generallydetermined by the length of the MLT sequence and the number of possiblebases at each degenerate position. For example, if an MLT is 8 baseslong, and has an approximately equal probability of having A, C, G, or Tat each position, then the number of possible sequences is 4̂8=65,536. Amolecular lineage tag is not designed to assign a completely uniquesequence tag to each molecule, but rather is designed to have a lowprobability of assigning any given sequence tag to a particularmolecule. The greater the number of possible MLT sequences, the lowerthe probability of any particular sequence being assigned to a molecule.When many template molecules are copied and tagged, the same MLTsequence can be assigned to more than one template molecule. MLTsequences are used to track the lineage of molecules from initialcopying through amplification, processing and sequencing. They can beused to distinguish sequences that arise from polymerasemisincorporations or sequencer errors from sequences that are derivedfrom true mutant template molecules. MLTs can also be used todistinguish sequences that have the wrong barcode assignment as a resultof cross-over of barcodes during pooled amplification. Because the sameMLT sequence can be assigned to more than one template molecule,meaningful analysis of MLT sequences requires first identifying varianttarget sequences and then analyzing the distribution of MLT sequencesassociated with those variants.

The term “molecular lineage tagging” refers to the process of assigningmolecular lineage tags to nucleic acid templates molecules. MLTs can beincorporated within primers, and are attached to copies made fromtargeted nucleic acids by specific extension of primers on thetemplates.

The term “include” and its derivations should be understood to mean“including, but not limited to”. The words “a”, “an”, and “the” includeboth singular and plural referents unless the context indicatesotherwise.

Embodiments of the Methods

Methods and compositions are disclosed herein for identifying andquantifying nucleic acid sequence variants. Methods disclosed herein canidentify and quantify low-abundance sequence variants from complexmixtures of DNA or RNA. Embodiments of the methods can measure smallamounts of tumor-derived DNA that can be found in the circulation ofpatients with various types of cancer.

Assessment of rare variant DNA sequences is important in many areas ofbiology and medicine. Small amounts of fetal DNA can be found in thecirculation of pregnant women. An embodiment includes analyzing rarefetal DNA that can be used to assess disease-associated genetic featuresor the sex of the fetus. An organ that is undergoing rejection by therecipient can release small amounts of DNA into the blood, and thisdonor-derived DNA can be distinguished based on genetic differencesbetween the donor and the recipient. An embodiment includes measuringdonor-derived DNA to provide information about organ rejection andefficacy of treatment. In another embodiment, nucleic acids can bedetected from an infectious agent (e.g., bacteria, virus, fungus,parasite, etc.) in a patient sample. Genetic information aboutvariations in pathogen nucleic acids can help to better characterize theinfection and to guide treatment decisions. For instance, detection ofantibiotic resistance genes in the bacterial genome infecting a patientcan direct antibiotic treatments.

Detection and measurement of low-abundance mutations has many importantapplications in the field of oncology. Since tumors are known to acquiresomatic mutations, some of which promote the unregulated proliferationof cancer cells, identifying and quantifying these mutations has becomea key diagnostic goal. Companion diagnostics have become an importanttool in identifying the mutational cause of cancer and thenadministering effective therapy for that particular mutation.Furthermore, some tumors acquire new mutations that confer resistance totargeted therapies. Thus, accurate determination of a tumor's mutationstatus can be a critical factor in determining the appropriateness ofparticular therapies for a given patient. However, detectingtumor-specific somatic mutations can be difficult, especially if tumortissue obtained from a biopsy or a resection has few tumor cells in alarge background of stromal cells. Tumor-derived mutant DNA can be evenmore challenging to measure when it is found in very small amounts inblood, sputum, urine, stool, pleural fluid, or other biological samples.

Tumor-derived DNA is released into the bloodstream from dying cancercells in patients with various types of malignancies. Detection ofcirculating tumor DNA (ctDNA) has several applications including, butnot limited to, detecting presence of a malignancy, informing aprognosis, assessing treatment efficacy, tracking changes in tumormutation status, and monitoring for disease recurrence or progression.Since unique somatic mutations can be used to distinguish tumor-derivedDNA from normal background DNA in plasma, a new class of highly specificDNA-based cancer biomarkers are described with clinical applicationsthat may complement those of conventional serum protein markers. In anembodiment, methods include screening ctDNA for presence oftumor-specific, somatic mutations. In such embodiments, false-positiveresults are very rare since it would be very unlikely to findcancer-related mutations in the plasma DNA of a healthy individual.Described herein are methods that specifically and sensitively measurerare mutant DNA molecules that are shed into blood from cancer cells.Achieving extremely high detection sensitivity is especially importantfor detection of a small tumor at an early (and more curable) stage.

Since somatic mutations can occur at many possible locations withinvarious cancer-related genes, a clinically useful test for analyzingctDNA would need to be able to evaluate mutations in many genessimultaneously, and preferably from many samples simultaneously. Inembodiments, analysis of a plurality of mutation-prone regions from aplurality of samples allows more efficient use of large volumes ofsequence data that can be obtained using massively parallel sequencingtechnologies. In an embodiment, labeling molecules arising from a givensample with a sample-specific DNA sequence tag, also known as a barcodeor index, facilitates simultaneous analysis of more than one sample. Byusing distinct barcode sequences to label molecules derived fromdifferent samples, it is possible to combine molecules and to carry outmassively parallel sequencing on a mixture. Resultant sequences can thenbe sorted based on barcode identity to determine which sequences werederived from which samples. To minimize chances of misclassification,barcodes are designed so that any given barcode can be reliablydistinguished from all other barcodes in the set by having distinctbases at a minimum of two positions.

In most protocols that are currently used to prepare samples formassively parallel sequencing, barcodes are attached after several stepsof sample processing (e.g. purification, amplification, end repair,etc). Barcodes can be attached either by ligation of barcoded sequencingadapters or by incorporation of barcodes within primers that are used tomake copies of nucleic acids of interest. Both approaches typicallyrequire several processing steps to be performed separately on nucleicacids derived from each sample before barcodes can be attached. Onlyafter barcodes are attached can samples be mixed.

In an embodiment, barcodes are assigned to targeted molecules at a veryearly step of sample processing. Targeted early barcode attachment notonly permits sequencing of multiple samples to be performed in batch, italso enables most antecedent processing steps to be performed in acombined reaction volume. Once barcodes are attached to nucleic acidmolecules in a sample-specific manner, molecules can be mixed, and allsubsequent steps can be carried out in a single tube. If a large numberof samples are analyzed, targeted early barcoding can greatly simplifythe workflow. Since all molecules can be processed under identicalconditions in a single tube, the molecules would experience uniformexperimental conditions, and inter-sample variations would be minimized.In an embodiment, tagging of nucleic acids from different samples can beachieved in consistent proportions and then used to enable quantitativecomparisons of nucleic acid concentrations across samples. In additionto quantifying DNA, targeted early barcoding can enable quantifying RNA(e.g., RNA expression levels across different samples). Once barcodesare attached, targeted nucleic acids bearing different sample-specificbarcodes can be amplified in a combined reaction volume by competitiveend-point PCR, and relative counts of different barcodes in amplifiedproducts could be used to quantify associated nucleic acids in varioussamples. Thus, early barcoding can be used to quantify a total amount ofvarious targeted nucleic acids, and not just variants, across manysamples.

In an embodiment, well-defined mixtures of primers are producedcontaining combinations of sample-specific barcodes and consistentratios of gene-specific segments. Such primers can be used for targetedearly barcoding and subsequent batched sample processing. These primerscan also be used for quantitation of DNA or RNA in different samples. Inan embodiment, such primers allow parallel processing and analysis ofmultiple mutation-prone genomic target regions from multiple samples ina simplified and uniform manner.

Embodiments include methods that accurately quantify mutant DNA ratherthan simply determining its presence or absence. In an embodiment, anamount of mutant DNA provides information about tumor burden andprognosis. Embodiments are capable of analyzing DNA that is highlyfragmented due to degradation by blood-borne nucleases as well as due todegradation upon release from cells undergoing apoptotic death. Sincesomatic mutations can occur at many possible locations within variouscancer-related genes, an embodiment can evaluate mutations in many genessimultaneously from a given sample. Embodiments are capable of findingmutations in ctDNA without knowing beforehand which mutations arepresent in a patient's tumor. An embodiment is able to screen for manydifferent types of cancer by evaluating multiple regions of genomic DNAthat are prone to developing tumor-specific somatic mutations. Anembodiment includes multiple samples combined together in the samereaction tube to minimize inter-sample variations.

Although the methods described herein have been optimized formeasurement of small amounts of mutant circulating tumor DNA (ctDNA) ina background of normal (wild-type) cell-free DNA in the plasma or serumof a patient having cancer, it is understood that they could be appliedmore broadly to the analysis of nucleic acid variants from a variety ofsources. Examples of such sources include, but are not limited to lymphnodes, tumor margins, pleural fluid, urine, stool, serum, bone marrow,peripheral white blood cells, cheek swabs, circulating tumor cells,cerebrospinal fluid, peritoneal fluid, amniotic fluid, cystic fluid,frozen tumor specimens, and tumor specimens that have beenformalin-fixed and paraffin-embedded.

Features:

Methods include identifying and measuring low-abundance variantsoccurring in multiple mutation-prone regions of genomes from multiplesamples in parallel. One aspect includes early attachment ofsample-specific DNA barcodes to a plurality of nucleic acid targets thatare derived from a plurality of samples. Specifically, a mixture ofgene-specific primers, all bearing the same barcode, are used to maketagged copies of several different genomic target regions from nucleicacids in a given sample in a single reaction volume. For each additionalsample, this process is repeated in a separate reaction volume using asimilar mixture of gene-specific primers bearing a different barcode.All members of a given primer mix have the same sample-specific barcode,but different primer mixes have different barcodes. Once barcodes havebeen attached, the DNA from multiple samples can be combined into asingle volume for further processing.

If many DNA targets from many samples are to be analyzed, large numbersof primers would need to be produced, each having different combinationsof barcoded 5′ segments and gene-specific 3′ segments. Targeted earlybarcoding allows combining nucleic acids from different samples andprocessing of the nucleic acids together in a combined reaction volume.Batched processing has an advantage of simplified workflow and greaterexperimental consistency and uniformity across different samples.Batched processing decreases potential quantitative variability arisingfrom very small inter-sample concentration or temperature differences.Although the variability may be small at time of initial input, the endresult may have substantial variability due to the exponential nature ofPCR. Amplification of differently barcoded nucleic acid copies in acombined reaction volume by competitive end-point PCR followed by highthroughput sequencing of the products would allow direct enumeration ofthe various barcodes associated with a given genomic target region. Therelative quantity of each targeted nucleic acid in the different samplescould be deduced from the relative abundance of the various barcodeswithin the sequence data.

Another aspect includes producing primers by combining modularoligonucleotide segments. Implementing targeted early barcoding requiresgenerating well-defined mixtures of large numbers of primers. Primermixtures are produced in such a way that each mixture contains identicalproportions of 3′-gene-specific segments, ensuring that target nucleicacids from different samples are copied in consistent ratios. This makesit possible to quantitatively compare nucleic acid concentrations acrossdifferent samples. In an embodiment, combining modular oligonucleotidesegments is used. More specifically, to generate each mixture, a portionof a uniform pool of various gene-specific 3′ oligonucleotide segmentsis joined to a single, uniquely-barcoded 5′ segment. Since the 3′segments used to produce each final mixture are derived from a commonpool (or master-mix), each uniquely barcoded primer mix has similarproportions of the different 3′ gene-specific segments. Severalapproaches are described herein for joining the modular 5′ and 3′segments. This modular approach to producing primer mixes allows theproduction of thousands of primer and barcode combinations that wouldhave otherwise been very costly and laborious to produce. Furthermore,the consistency of gene-specific primer ratios that can be achievedacross different mixes would not be possible by mixing individuallysynthesized primers. Methods described herein utilize next-generation,high-throughput DNA sequencing technologies to identify and quantifynucleic acid variants. These technologies are able to quickly andinexpensively produce sequences from millions of DNA molecules in amassively parallel fashion. By oversampling sequences of a large numberof DNA molecules from a particular genomic region using ultra-deepsequencing, it would be possible to identify and enumerate rare sequencevariants. The sensitivity of the sequencing is limited by the inherenterror rate of the sequencer since incorrectly read bases might bemistaken for true mutant DNA copies. Mutant ctDNA has been reported tocomprise on average 0.2% of total plasma DNA (Diehl et al., Nat Med.2008; 14: 985-990)—a range in which sequencer misreads can beproblematic. This is a limitation of massively parallel sequencing tomeasure very low-abundance mutations.

Herein methods are described that use clonal overlapping paired-endsequencing to achieve sequence redundancy in mutation-prone regions,thereby allowing base calls to be made with much greater confidence.Embodiments include methods of reducing, suppressing, and distinguishingsequencer-derived errors. Using an Illumina® next-generation sequencingplatform, an embodiment includes obtaining a read in one direction froma clonal cluster of DNA molecules, and then subsequently obtaining aread in the opposite direction (from the opposite strand of the duplex).The length of each “paired-end” read can be 36, 50, 75, 100, or 150 bpor longer. An embodiment includes sequencing short PCR amplicons in apaired-end fashion to obtain overlapping reads from both strands of aclone. By designing the mutation-prone region to be in the area ofsequence overlap, clonal sequence redundancy can be achieved in thisregion. Thus, each clonal sequence from a mutation-prone region is readin one direction, and then is proofread in the other direction.Read-pairs that do not have perfect agreement in the overlapping region(after obtaining a reverse-complement of one of the reads) can beattributed to sequencer error, and can be ignored in the final analysis.In this way, sequencer-generated errors in a region of interest can bereduced since a probability of finding the same sequencer error in readsfrom both strands of a clone is exceedingly low. By reducing thebackground of sequencer errors, it becomes possible to achieve betterdetection sensitivity for rare mutant molecules. Detection sensitivityis especially important in patients with early-stage cancers who arelikely to have a very low concentration of mutant ctDNA molecules intheir blood.

Another aspect includes distinguishing nucleotide misincorporationerrors that can be introduced during DNA copying, amplification, orprocessing. After suppression of sequencer-derived errors, variantsequences are still found that do not correspond to authentic mutationsarising from mutant template DNA molecules. A majority of these variantsequences arise from incorporation of incorrect nucleotides when DNAtemplate molecules are copied or amplified. Possible causes of suchmisincorporation errors include but are not limited to DNA damage (forexample, cytosine deamination during heating) or polymerase-inducederrors.

To distinguish variant sequences arising from true mutant templatemolecules versus those arising from misincorporation errors, anembodiment includes molecular lineage tagging. In molecular lineagetagging, a degenerate sequence called a molecular lineage tag (MLT) isincorporated into primers that make a small number of copies (between 2to 20) of an original template DNA molecule. An MLT is a stretch ofdegenerate sequence having an approximately equal probability of havingA, T, C, or G at each position and can be about 2 to about 10 bases inlength, but preferably would be 6, 7, or 8 bases long. An MLT sequencecan also be split into segments that are separated by non-degeneratepositions within an oligonucleotide.

It is not necessary that each template molecule be tagged with a uniqueMLT, but only that each template molecule should have a low probabilityof being tagged with any given MLT-sequence. For example, if the MLTregion consisted of 8 degenerate positions, then 4̂8=65,536 possible MLTsequences could be generated. MLT-containing primers are used to make alimited number of copies of the template DNA molecules, via either a fewcycles (2 to 4) of PCR or primer-extension. Thus, each template copywould be tagged with one of 65,536 possible MLT sequences. When thesetagged copies are amplified by PCR, the “progeny” molecules derived fromamplification of a given “parent” copy should retain the sameidentifying MLT sequence as the parent molecule. If a variant sequencearose from a true mutant template molecule, then many copies of a givenMLT sequence should be associated with that variant sequence (since thatMLT was associated with the mutant copy at the beginning of theamplification process). On the other hand, if an error was introducedduring amplification or processing, one would expect a smaller number ofcopies of a given MLT to be associated with the erroneous variantsequence (unless the error occurred at a very early cycle ofamplification). It is important to note that if several thousandtemplate molecules are tagged with MLTs, there is a high probabilitythat some MLT sequences may be assigned to more than one templatemolecule.

With non-unique MLT's, it is less informative to evaluate the percentageof mutant and wild-type sequences associated with a particular MLTsequence. Rather, it is preferable to identify mutant sequences, andthen to evaluate distribution of MLT sequences associated with thosevariants. If the number of sampled clonal sequences (post-amplification)is several-fold greater than the number of tagged template copies, thenvariant sequences arising from true mutant template molecules would beassociated with multiple copies of a given MLT sequence, whereasvariants arising from misincorporation errors would be likely to beassociated with fewer copies of any given MLT. Analysis of MLTdistributions (number of different MLT sequences and number of copies ofeach sequence) associated with a particular variant made it possible toidentify the majority of variants arising from misincorporation errors,thereby further improving the sensitivity for detecting truetemplate-derived mutations.

Another aspect includes distinguishing sequences that are misclassifiedas belonging to a wrong sample. Such incorrect classification of asequence can occur if it is associated with an inappropriate barcode.Since barcodes are designed to differ from all other barcodes in a setat a minimum of two distinct positions, misclassification due to barcodesequence errors would be rare. However, cross-over of barcodes has beenobserved from differently barcoded molecules that undergo combinedpolymerization or amplification in the same reaction volume. This canhappen, for example, if primer-extension stalls before a polymerase hascompleted extending on a template during a given cycle of PCR. Thatpartially-extended strand (possibly containing a mutant or wild-typesequence) could then anneal to a different template during the nextcycle of PCR, and could incorporate an inappropriate barcode.Alternatively, if two strands of DNA containing different barcodes areannealed to each other via a common complementary sequence, the 3′-5′exonuclease activity of a proofreading polymerase can digest the barcodeon one strand and then extend that strand using the opposite strand'sbarcode as a template. MLT sequences can be used to distinguishsequences derived from such barcode “cross-over” events. If an MLTregion is positioned in proximity to or adjacent to a barcode sequence,then it can be used to track the lineage of the barcode. If a variant istagged with an inappropriate barcode as a result of cross-over duringthe process of amplification, then one would expect fewer than averagecopies of a particular MLT sequence to be associated with thatbarcode/variant combination. To further aid in distinguishing cross-oversequences, a second MLT can be positioned on the opposite side of themutation-prone region (so that the sequence order, for example, could beMLT-1/Barcode/mutation-prone region/MLT-2). In this case, DNA moleculesthat undergo cross-over between a barcode and a mutation prone regionwould also undergo cross-over of MLT-1 between MLT-2. Thus, suchcrossed-over sequences could be identified because the number of copiesof a particular MLT-1/MLT-2 combination would be lower than forsequences that did not undergo cross-over. Thus, MLT sequences can allowdifferently barcoded molecules to be amplified in a combined reactionvolume while maintaining accurate assignment of mutations to specificsamples.

Another aspect includes highly-specific tagging, copying, andamplification of several genomic target regions from several samplessimultaneously in a single reaction volume while minimizing accumulationof unwanted, spurious amplification products. Such highly multiplexedprocessing and amplification is prone to accumulation of spuriousproducts because of the presence of large numbers of different primers.Having a complex mixture of primers with different combinations ofbarcodes, degenerate sequence regions, and gene-specific regions in asingle PCR amplification can lead to formation of many primer dimers andnon-specific amplification products. An embodiment includes multi-steptagging and amplifying without having to compromise primerconcentrations. An embodiment of a process includes highly stringentpurification of desired amplification products between eachamplification step to remove unextended primers, spurious extensionproducts, and genomic template DNA as well as enzyme, buffer, andnucleotides. An embodiment utilizes biotin-tagged oligonucleotides tomediate specific isolation of desired products. Another embodimentutilizes high-temperature washes when using biotin-taggedoligonucleotides. Another embodiment includes digesting unwantedsingle-stranded products and primers with an exonuclease to furtherimprove amplification specificity. An embodiment also uses nestedprimers to provide further selectivity for desired products. Anembodiment includes universal PCR primers for the final amplification.Under the stringent conditions described herein, universal PCR primerscan be used for the final amplification without significant accumulationof spurious products.

Methods: Producing Combinations of Modular Oligonucleotide Segments forTagging of Nucleic Acids

In an embodiment, tagged copies of multiple nucleic acid targets aremade from template DNA or RNA derived from a given sample. To producesuch tagged copies, a mixture of primers is used in which the3′-segments of the primers are able to hybridize to RNA or DNA targetsby sequence complementarity (as illustrated, for example, by the reverseprimers 1 in FIG. 1). A polymerase (such as a reverse transcriptase or aDNA polymerase) can then be used to extend the primers in the 5′ to 3′direction using the targeted nucleic acids as templates. In anembodiment, a sample-specific DNA barcode sequence can be incorporatedinto the 5′-segment of each primer such that the barcode becomesattached to the copy of the target template after undergoingprimer-extension. Since multiple target templates are to be copied froma given sample in a single reaction tube, a mixture of primers isrequired having various target-specific sequences in their 3′-segmentsand all having the same sample-specific barcode sequence in their5′-segments. If several different samples are to be analyzed, thensimilar mixtures of primers must be made for each sample, with eachmixture containing a unique, sample-specific barcode sequence in the5′-segment. In some embodiments, the 5′-segment of each primer can alsocontain other elements such as sequencing adapters (to facilitatesequencing of the copied DNA), binding sites for PCR primers, orstretches of degenerate sequence (having equal probability of A, C, T,or G bases at each position) that can serve as tags to follow thelineage of molecules during copying, amplification, and sequencing.

In an embodiment, a barcode comprises a unique sequence (typically 6 to12 nucleotides long) that is used to identify molecules derived from aparticular sample after molecules from multiple samples are pooled andsequenced in batch. In an embodiment, a computer program can be used tosort clonal sequences derived from each molecule based on barcodeidentity. In order to minimize the chance that a sequence derived fromone sample might be misclassified as being derived from another sample,each barcode sequence is designed to differ from all other barcodes inthe set by at least 2 nucleotides (so that a single sequencing errorwould not lead to misclassification).

In an embodiment, multiple gene-specific primer regions (at the 3′-endsof primers) are attached in separate batches, to unique sample-specificbarcodes (near the 5′-regions of primers). If many genomic targets areto be analyzed from many samples, the number of combinations of primer3′-ends and 5′-ends can become very large. For example, if 40 targetgene regions are to be evaluated from 96 different samples, 40×96=3,840different oligonucleotides would need to be made, each with a uniquecombination of 3′ gene-specific sequence and 5′ barcode. If conventionaloligonucleotides were individually synthesized, a mixture of 40different gene-specific primers having a particular barcode would beused to primer-extend nucleic acid targets from a given sample within asingle tube. Thus, all 40 target regions would be tagged with the samesample-specific barcode. However, synthesis and purification of 3,840oligonucleotides individually would be impractical. Because terminationsequences would be abundant when making long primers, full-lengtholigonucleotides would have to be purified by methods including but notlimited to polyacrylamide gel electrophoresis, high performance liquidchromatography, or reverse-phase cartridge purification.

To address the need for producing uniform mixtures of multiplegene-specific primer, with each mixture having a unique barcodesequence, embodiments are described in which combinations of modulargene-specific 3′ oligonucleotide segments can be attached to a modularbarcoded 5′ oligonucleotide segment. In various embodiments, productionof modular oligonucleotides allows multiple gene-specific 3′ segments tobe synthesized and uniformly mixed, and then attached in separatebatches to each barcoded 5′ segment (FIG. 2). Resulting uniquelybarcoded primer mixes would each have consistent ratios of gene-specificprimer sequences. In some embodiments, such uniform primer mixes couldbe used to copy and label DNA or RNA molecules from different sampleswith consistent efficiency (so that the resulting tagged copies would beproportionate to the amount of target nucleic acids in each sample).Subsequent pooled amplification of differently barcoded molecules bycompetitive PCR, followed sequencing to count barcoded sequences, wouldenable accurate quantitation of DNA or RNA targets in the varioussamples. Such uniform primer mixes would be very difficult to achieve bysimply mixing individually synthesized primers.

In some embodiments, multiple 3′ oligonucleotide segments are producedand mixed, and then the mixture is joined in separate batches to unique5′ oligonucleotide segments. In an embodiment (FIG. 3), different3′-segments can be synthesized in separate columns on an automatedoligonucleotide synthesizer (on solid supports). Synthesis can then bepaused and the solid supports from the different columns can beuniformly mixed. Then the mixture can be dispensed into several freshcolumns. Synthesis can then be continued, adding a uniquely barcoded 5′segment to each new column. After cleavage, deprotection, andpurification, the desired uniquely barcoded uniform primer mixtures areobtained. In another embodiment, mixtures of 5′-phosphorylated3′-segments can be ligated to different barcoded oligonucleotides usingsplint-mediated enzymatic ligation. In another embodiment, primerextension can be used to produce combinations of modular segments. Inthis approach, a single barcoded 5′ oligonucleotide can be hybridizedvia a common sequence to a mixture of complementary templates. Thesetemplates can be designed to produce various gene-specific 3′-ends whenthe barcoded oligonucleotide is primer extended on these templates. Inan embodiment, biotin tags on the templates can be used to separate thetemplates from the desired uniquely barcoded primer mix. Similar primerextension reactions can be performed in separate reaction volumes usingdifferent barcoded oligonucleotides to produce several similar uniformprimer mixes. In yet another embodiment, mixtures of 3′ oligonucleotidesegments having reactive chemical conjugation moieties on their 5′-endcan be combined in separate batches with uniquely barcoded 5′oligonucleotides segments having reactive conjugation moieties on attheir 3′ ends. Such chemical conjugation would allow post-synthesiscombination of oligonucleotide segments. Special conjugation chemistrieshave been previously described that can conjugate two oligonucleotidesegments together leaving a phosphodiester bond at the junction (orsimilar bond that would be compatible with subsequent enzymaticprocesses).

Isolation of Template DNA

Embodiments provide methods for purification or isolation of DNA or RNAfrom various clinical or experimental specimens. Many kits and reagentsare commercially available to facilitate nucleic acid purification.Depending on the type of sample to be analyzed, appropriate nucleic acidisolation techniques can be selected. Substances that might inhibitsubsequent enzymatic reaction steps (such as polymerization) should beremoved or reduced to non-inhibitory concentrations in purified DNA orRNA samples. A yield of nucleic should be maximized whenever possible.It would be disadvantageous to lose DNA during purification, wherein thelost DNA might include rare variant DNA. When isolating DNA from plasma,about 10 ng to 100 ng of cell-free DNA can be purified from 1 mL ofplasma, which corresponds to 3,500 to 35,000 genome copies. To note, DNAyields can vary dramatically, especially in patients with an ongoingdisease process such as cancer.

In an embodiment, DNA can also be analyzed from other sample types,including but not limited to the following: pleural fluid, urine, stool,serum, bone marrow, peripheral white blood cells, circulating tumorcells, cerebrospinal fluid, peritoneal fluid, amniotic fluid, cysticfluid, lymph nodes, frozen tumor specimens, and tumor specimens thathave been formalin-fixed and paraffin-embedded.

Producing a Limited Number of Tagged Copies of Targeted Nucleic Acids

In an embodiment, a limited number of tagged copies (e.g., fewer than20) of targeted nucleic acid molecules are made at an early step in theprocess. After DNA or RNA is purified from the original sample, targetednucleic acid template molecules can be copied by specificallyhybridizing and polymerizing tagged primers. When a plurality of targetregions are to be copied and tagged from a given sample, a mixture ofmodular barcoded primers can be used (as described above). In anembodiment, targeted nucleic acid regions are mutation-prone regions(also called mutation hotspots). A mixture of primers for a given samplecan contain sequences at their 3′-ends that specifically hybridize to anarea of DNA near or adjacent to a target region. All primers used for agiven sample would have the same sample-specific barcode sequence, anddifferent samples would have different barcodes. In some embodiments,the primers can also contain stretches of degenerate bases known asmolecular lineage tags (MLTs) that can be helpful in distinguishingsequences arising from true mutant template molecules versus thosearising from misincorporation errors occurring during amplification orprocessing. The MLTs can also help to identify sequences that areassigned to the wrong barcode due to cross-over of barcodes duringpooled amplification of differently barcoded molecules. In anembodiment, primers can also contain adapter sequences that arenecessary for sequencing, and universal primer binding sites that can beused in subsequent amplifications.

In an embodiment, a DNA polymerase can be used to extend the primers onhybridized templates, thus producing copies of the target nucleic acidswith sample-specific barcodes attached. A DNA polymerase can be athermostable or non-thermostable enzyme, and may or may not haveproofreading activity. Examples of polymerases include, but are notlimited to, Taq, Phusion®, Vent_(R)®, Pfu, Pfx, DNA Polymerase I (Klenowfragment), or reverse transcriptase. When specific primer annealing andextension is to be carried out at temperatures above 50° C.,thermostable polymerases with hot-start capability are preferred inorder to minimize spurious polymerization at room temperature duringreaction set-up. Copies of template nucleic acids can be made by asingle primer extension step, by a few cycles of primer extension (1 to10 cycles, with heat-denaturation of the extended products betweencycles), or by a few cycles of PCR in which opposite primers are alsoadded (2 to 5 cycles). A few tagged copies of each template molecule canbe produced so that a complete sampling of sequences can be obtainedeven if there is some loss of copies during the various purification,processing, and amplification steps. However, the number of taggedcopies must be limited to avoid assigning too many different MLTs toeach template molecule, which would require greater sequencing depth foranalysis. In an embodiment, after a limited number of tagged copies aremade, the polymerase is inactivated, and barcoded copies from differentsamples can then be pooled into a combined volume for furtherprocessing.

Purification of Tagged Copies

In an embodiment, tagged, primer-extended copies of target sequences arepurified away from un-extended and non-specifically extended primers andfrom excess template nucleic acids. Purification also removes otherreaction components such as buffer, dNTPs, and polymerase. Removal ofun-extended primers and non-specifically extended primers is preferredso that they are not carried over to the next polymerization step. Also,removal of excess primers and template molecules allows greaterspecificity of polymerization in subsequent steps.

In an embodiment, purification of specifically tagged and extendedproducts is mediated by capture using biotin-labeled complementaryoligonucleotides that hybridize to the specifically extended products.Oligonucleotides can be designed to anneal to sequences produced whentagged primers are extended beyond the mutation-prone region (or targetregion). Such hybridization of the biotin-labeled captureoligonucleotides to the extended tagged copies can be achieved either byusing the biotinylated primers in PCR (FIG. 1), or by subsequentlyannealing them to primer-extended copies (FIG. 10). In an embodiment,immobilized streptavidin (or an analogue with affinity for biotin) isused to isolate and purify the tagged, extended copies that hybridize tothe capture oligonucleotides. Immobilized streptavidin is available inmany forms, including but not limited to surface-bound, agarosebead-bound, magnetic bead-bound, or filter-bound. In some embodiments,removal of non-specifically annealed nucleic acids can be achieved bywashing the bound molecules at room temperature or at elevatedtemperatures that would selectively disrupt short, non-specificstretches of hybridization, but would not disrupt specifically-annealedproducts. In some embodiments, nuclease treatment of the bound moleculescan also be used to digest non-specifically annealed products. Nucleasesthat could be used include but are not limited to Exonuclease I,Exonuclease VII, and Rec Jf. In some embodiments, elution ofspecifically-annealed copies can be achieved by heat-denaturation or byalkaline-denaturation to separate biotin-labeled strands from thedesired single-stranded tagged copies. Biotin labeled strands shouldremain attached to the immobilized streptavidin since thebiotin-streptavidin interaction is not substantially disrupted by heator moderate alkaline conditions.

In another embodiment, specifically primer-extended copies can bepurified by carrying out limited cycles of PCR and then digestingsingle-stranded nucleic acids to remove un-extended primers. In yetanother embodiment, oligonucleotides can be specifically hybridized toprimer-extended products to protect their 3′-ends from digestion by a 3′to 5′ single-stranded exonuclease such as Exonuclease I.

Double-stranded products that survive digestion can be purified by avariety of approaches, including but not limited to ethanolprecipitation, silica membrane partitioning, or binding to magneticSolid Phase Reversible Immobilization (SPRI) beads.

Second Round of Tagging, Copying, and Purification

In an embodiment, the tagged, pooled, and purified DNA copies frommultiple samples can be subjected to another round of limited-cycleprimer-extension or limited-cycle PCR (similar number of cycles asdescribed for the first round). Primers used in this second round wouldbe designed to incorporate MLTs on the opposite side of themutation-prone region relative to the MLTs incorporated in the firstround (FIG. 1). This second MLT region could be used to distinguishsequences arising from barcode cross-over events that occurred duringpooled amplification or processing. Use of nested primers in the secondround of PCR or primer-extension would provide additional selectivityfor the targeted genomic sequences. In an embodiment, primers used inthe second round could contain universal primer binding sites that wouldbe used for subsequent amplification with universal primers. In anembodiment, primers could also contain adapter sequences that facilitatesequencing using a next-generation sequencer.

In an embodiment, a limited number of specifically primer-extendedcopies produced in the second round could be purified away fromun-extended or non-specifically extended primers and other reactioncomponents using similar approaches as described for the first round. Inan embodiment, purification can be achieved using biotinylated captureoligonucleotides designed to specifically hybridize to sites on theopposite primer-extended strands (relative to the hybridization sites ofthe biotinylated oligonucleotides used in the first round). In anembodiment, nuclease treatment may be used to digest un-extended ornon-specifically extended primers.

Amplification and Purification of Specifically Copied and TaggedProducts

In an embodiment, products from the first two rounds of copying,tagging, and purification are used as templates for further PCRamplification. In an embodiment, universal primers are used for PCR thatare designed to bind to sequences introduced by primers in the first tworounds. Since universal primers are used, it is very important that onlydesired targeted products remain as templates for the final PCR afterthe second-round purification. Presence of even small amounts of primerdimers or other spurious products could lead to competitiveamplification of undesired templates by the universal primers. In anembodiment, this round of PCR can be carried out for a larger number ofcycles than were used in the first 2 rounds. A total of 5 to 40 PCRcycles may be used, depending on the amount of template nucleic acidpresent and the number of samples being multiplexed. A final PCR isdesigned to produce sufficient DNA as required for massively parallelsequencing (which can differ depending on the sequencing platform beingused). In some embodiments, a final PCR may not be necessary if therequired input of the sequencer is satisfied by the amount of DNAproduct generated after the first 2 rounds. In some embodiments, the DNAproducts are gel-purified to select products of the desired size and toeliminate unused primers before subjecting to massively parallelsequencing. In some embodiments, other approaches to purification couldbe used, including but not limited to high-performance liquidchromatography, capillary electrophoresis, silica membrane partitioning,or binding to magnetic Solid Phase Reversible Immobilization (SPRI)beads.

Massively Parallel Sequencing and Data Analysis

In an embodiment, a next-generation sequencer is used to obtain largenumbers of sequences from the tagged, amplified, and purified products.Clonal sequences (each sequence arising from a single nucleic acidmolecule) produced by such a sequencer can be used to identify andquantify variant molecules using an approach known as ultra-deepsequencing. In principle, because large numbers of sequences can beobtained for each target site and for each sample, rare variants can bedetected and measured. However, the error rate of the sequencer canlimit the sensitivity of detection because such errors might be mistakenas true variants. To minimize the contribution of sequencer errors, anembodiment uses clonal overlapping paired-end sequences. By separatelysequencing opposite strands of DNA from each clonal population, andcomparing the overlapping regions of the sequences, the vast majority ofvariants arising from sequencer errors can be eliminated. In anembodiment, the region of sequence overlap is designed to be in themutation-prone area. In an embodiment, only read-pairs that perfectlymatch in the overlapping region are retained for further analysis. Forsuch analysis, instruments that produce clonal paired-end reads (such asthe Illumina platform) are preferred. In some embodiments, othermassively parallel sequencing platforms that provide sequence redundancycan also be utilized.

In an embodiment, errors introduced into the DNA during amplification orprocessing can be distinguished from true template-derived mutantsequences by analyzing the distribution of molecular lineage tags (MLTs)associated with variant sequences. In an embodiment, MLTs can also beused to distinguish sequences bearing incorrect barcodes due tocross-over events during pooled amplification.

The present technology may be better understood by reference to thefollowing examples. These examples are intended to be representative ofspecific embodiments of the invention and are not intended to limit thescope of the invention.

EXAMPLES Example 1

This example demonstrates application of a deep sequencing approach inwhich 3 mutation hotspot regions were analyzed from multiple plasmasamples. The method in this example includes redundancy within eachclonal sequence to produce extremely high quality base-calls in short,mutation-prone regions of plasma DNA. Amplification of both mutated andwild-type sequences was carried out by unbiased PCR in the same tube,ensuring highly accurate and reproducible quantitation. The scheme wasdesigned to have flexibility to simultaneously analyze mutations inseveral genes from multiple patient samples, making it practicallyfeasible to screen plasma samples for mutant ctDNA without priorknowledge of the tumor's mutation profile.

Materials and Methods Patient Plasma and Tumor Samples

Under the approval of the Human Investigation Committees at the YaleSchool of Medicine and at Lawrence & Memorial Hospital, plasma sampleswere obtained from 30 patients with stage I-IV non-small cell lungcancer (NSCLC) between July 2009 and July 2010. Informed consent wasobtained from all patients. Most patients were recruited in theradiation oncology clinic, and underwent treatment with radiationtherapy, chemotherapy, targeted systemic therapy, and/or surgery.Whenever possible, blood samples were collected from patients beforestarting the current course of treatment and then at subsequent timesduring and after treatment. A total of 117 samples were obtained.Formalin-fixed, paraffin-embedded tumor specimens were obtained for allpatients with non-squamous histology whose tumors had not already beentested for mutations by a clinical laboratory, and for whom sufficienttissue was available in the block after standard pathology evaluation.

Extraction and Amplification of Plasma DNA

Blood was collected in EDTA-containing tubes (Becton Dickinson) and wascentrifuged at 1000 g for 10 minutes within 3 hours of collection.Plasma was transferred to cryovials, being careful to avoid the buffycoat, and was stored at −80° C. until further processing. Frozen plasmaaliquots stored at −80° C. were thawed to room temperature, and DNA waspurified using the QIAamp® DNA Blood Mini kit (Qiagen Sciences,Valencia, Calif.) as per the manufacturer's instructions. 5 μg ofcarrier RNA was added to each 200 μL plasma sample as recommended toimprove adsorption of low-concentration nucleic acids to the silicamembrane.

Purified plasma DNA was then subjected to 2 rounds of amplification byPCR (in triplicate) using primers designed to amplify short DNA segmentsthat included codons 12 and 13 of KRAS, codon 858 of EGFR, and codon 600of BRAF. The sequences of the primers used in both rounds of PCR arelisted in Table 1.

TABLE 1 Primers used in first and second rounds of PCR. PCR SEQ IDPrimer Sequence (5′→3′) NO: Round 1 GGCCTGCTGAAAATGACTGAATATAAAC  1Forward KRAS* Round 1 TTCGTCCACAAAATGATTCTGAATTAGC  2 Reverse KRAS*Round 1 TCATGAAGACCTCACAGTAAAAATAGGTG  3 Forward BRAF* Round 1CACAAAATGGATCCAGACAACTGTTC  4 Reverse BRAF* Round 1GTACTGGTGAAAACACCGCAGCAT  5 Forward EGFR* Round 1CTTACTTTGCCTCCTTCTGCATGGTATT  6 Reverse EGFR* Round 2 GG[FWD  7 ForwardBC]CGAACAGTCTCCGAATATAAACTTGTGG KRAS TAGTTGG Round 2 GC[REV  8 ReverseBC]GGATGAGTGCAGTGAATTAGCTGTATCG KRAS TCAAG Round 2 GG[FWD  9 ForwardBC]CGAACAGTCTCCAAATAGGTGATTTT BRAF GGTCTAGC Round 2 GC[REV 10 ReverseBC]GGATGAGTGCAGCCAGACAACTGTTCAA BRAF ACTGA Round 2 GG[FWD 11 ForwardBC]CGAACAGTCTCCCAGCATGTCAAGAT EGFR CACAGATT Round 2 GC[REV 12 ReverseBC]GGATGAGTGCAGGCATGGTATTCTTT EGFR CTCTTCC *Primers were gel-purifiedprior to use. FWD BC = Forward barcode REV BC = Reverse barcode

In a first round of PCR, all hotspot regions from a given sample wereamplified in a multiplexed fashion. Three aliquots of purified plasmaDNA from each sample were used as templates in three identicalmultiplexed PCRs containing 1× Kapa Fidelity buffer (Kapa Biosystems,Inc., Woburn, Mass.), 300 μM each dNTP, 50 nM each primer (Round 1Forward and Reverse KRAS, BRAF, and EGFR primers), and 1 unit/50 μL HiFiHotstart DNA polymerase (Kapa Biosystems). Mineral oil was added to allPCR tubes to minimize evaporation during heating. Temperature cyclingparameters were 95° C. for 2 minutes, followed by 35 cycles of 98° C.for 20 sec, 64° C. for 20 sec, and 72° C. for 30 sec. A final extensionwas performed at 72° C. for 1 minute, prior to cooling the reaction at4° C. EDTA was then added at a final concentration of 5 mM to stoppolymerase activity.

The amplicons from each first round PCR were diluted 5000-fold and usedas templates for 3 separate second round PCRs to individually amplifythe hotspot regions of KRAS, BRAF, or EGFR. To promote specificamplification, the second-round primers were nested relative to theprimers used in the first round of PCR. The nested primers were labeledwith sample-specific barcode sequences to allow multiplexed sequencingof DNA from many samples. The barcode sequences were 6 nucleotides inlength, and were designed to differ from all other barcodes in the setat a minimum of 2 positions so that a single sequencing error would notlead to misclassification of samples. Different combinations of 16forward and 16 reverse barcoded primers could be used to uniquelyidentify up to 256 different samples. PCR was carried out using the samereaction conditions as were used in the first round, with the followingmodifications: the annealing temperature was increased to 65° C., andthe 3 pairs of multiplexed primers were replaced with a single pair ofbarcoded primers (Round 2 Forward and Reverse KRAS, BRAF, or EGFRprimers listed in Table 1) at a final concentration of 200 nM each.After addition of 5 mM EDTA, the PCR products were mixed together toproduce 3 pools, one for each of the 3 replicate reactions. All PCRsteps were carried out using a high-fidelity polymerase (HiFi HotStart,Kapa Biosystems).

Production of Barcoded PCR Primers

In order to build flexibility and scalability into the design of thedeep sequencing scheme, barcoded oligonucleotides and gene-specific PCRprimers were combined in a modular fashion, as illustrated in FIG. 9. Aset of 16 unique barcodes was produced for the forward primers, and adifferent set of 16 barcodes was produced for the reverse primers. Thesebarcodes were 6 nucleotides in length, and were designed to differ fromall other barcodes in the set by at least 2 nucleotides (to minimize theprobability that miscalled bases would cause misclassification ofsequences). Each barcode was incorporated into an oligonucleotide, whichwas primer-extended using a partially complementary single-strandedtemplate containing the reverse-complement of the gene-specific primersequence. The sequences of the template and barcode-containingoligonucleotides are listed in Table 2.

TABLE 2 Oligonucleotides used to produce barcoded primers.Oligonucleotide Sequence (5′→3′) SEQ ID NO: Forward KRAS Biotin- 13template CCAACTACCACAAGTTTATATTCGGAGACTGTTCG Forward EGFRBiotin-AATCTGTGATCTTGACATGCTGGGAGACT 14 template GTTCG Forward BRAFBiotin-GCTAGACCAAAATCACCTATTTGGAGAC 15 template TGTTCG Forward barcodeGG AACCTT CGAACAGTCTCC 16 1 oligo Forward barcode GG AACGTA CGAACAGTCTCC17 2 oligo Forward barcode GG AAGCAT CGAACAGTCTCC 18 3 oligoForward barcode GG AAGGAA CGAACAGTCTCC 19 4 oligo Forward barcode GGATCCAT CGAACAGTCTCC 20 5 oligo Forward barcode GG ATCGAA CGAACAGTCTCC 216 oligo Forward barcode GG ATGCAA CGAACAGTCTCC 22 7 oligoForward barcode GG ATGGTA CGAACAGTCTCC 23 8 oligo Forward barcode GGTACCTA CGAACAGTCTCC 24 9 oligo Forward barcode GG TACGAA CGAACAGTCTCC 2510 oligo Forward barcode GG TACGTT CGAACAGTCTCC 26 11 oligoForward barcode GG TAGCTT CGAACAGTCTCC 27 12 oligo Forward barcode GGTAGGAT CGAACAGTCTCC 28 13 oligo Forward barcode GG TTCGAT CGAACAGTCTCC29 14 oligo Forward barcode GG TTGCAT CGAACAGTCTCC 30 15 oligoForward barcode GG TTGCTA CGAACAGTCTCC 31 16 oligo Reverse KRAS Biotin-32 template CTTGACGATACAGCTAATTCACTGCACTCATCC Reverse EGFR Biotin- 33template GGAAGAGAAAGAATACCATGCCTGCACTCATCC Reverse BRAF Biotin- 34template TCAGTTTGAACAGTTGTCTGGCTGCACTCATCC Reverse barcode 1 GC AATCAAGGATGAGTGCAG 35 oligo Reverse barcode 2 GC AATGAT GGATGAGTGCAG 36 oligoReverse barcode 3 GC AAGATA GGATGAGTGCAG 37 oligo Reverse barcode 4 GCAACATT GGATGAGTGCAG 38 oligo Reverse barcode 5 GC ATCATA GGATGAGTGCAG 39oligo Reverse barcode 6 GC ATAGTT GGATGAGTGCAG 40 oligoReverse barcode 7 GC ATCAAT GGATGAGTGCAG 41 oligo Reverse barcode 8 GCATGTAT GGATGAGTGCAG 42 oligo Reverse barcode 9 GC TAACAT GGATGAGTGCAG 43oligo Reverse barcode GC TAGTAA GGATGAGTGCAG 44 10 oligo Reverse barcodeGC TATGTA GGATGAGTGCAG 45 11 oligo Reverse barcode GC TTACAAGGATGAGTGCAG 46 12 oligo Reverse barcode GC TTCATT GGATGAGTGCAG 4713 oligo Reverse barcode GC TTAGTA GGATGAGTGCAG 48 14 oligoReverse barcode GC TTCTAA GGATGAGTGCAG 49 15 oligo Reverse barcode GCTACAAT GGATGAGTGCAG 50 16 oligo Barcode sequences are boldfaced andunderlined.

Each forward barcode oligo (8 μM) was annealed to each forward templateoligo (8 μM) in separate reaction tubes containing 1×NEBuffer 2 (NewEngland Biolabs, Ipswich, Mass.), 200 μM each dNTP, and 1 mMdithiothreitol. Annealing was carried out by heating the solution to 95°C. for 2 minutes, 60° C. for 1 minute, and then slowly cooling to 25° C.over approximately 15 minutes. All possible combinations of forwardbarcode and template oligos were produced. The set of reverse oligoswere annealed in a similar manner. 1 unit/10 μL of DNA polymerase I,Large (Klenow) Fragment (New England Biolabs) was added to each tube,and the reaction was incubated at 25° C. for 30 minutes. The reactionwas stopped by adding 25 mM ethylenediaminetetraacetic acid (EDTA) andheating to 75° C. for 20 minutes. A biotin tag attached to the 5′-end ofthe template oligonucleotide was used to purify the primer-extendedproducts from the reaction mix by binding to high capacitystreptavidin-coated agarose resin (ThermoFisher Scientific, Wilmington,Mass.) (5 μL resin slurry added per 50 μL reaction). The resin particleswere agitated constantly in the solution at room temperature for 8hours. The resin was washed three times in buffer containing 10 mM TrispH 7.6 and 50 mM NaCl. The barcoded PCR primers were then released fromthe resin-bound template oligos into a fresh 40 μL volume of the samebuffer by heat denaturation at 95° C. for 1 minute. After concentrationadjustment, the primers were ready for use in PCR.

Analysis of Cell Line DNA

Genomic DNA was purified from human cancer cell lines using the samemethod used for purifying plasma DNA, after suspending cells in 0.2 mLof phosphate-buffered saline. The following cell lines were used: A549(having a KRAS Gly12Ser mutation), H1957 (having an EGFR Leu858Argmutation), and YUSAC (having a BRAF Val600Glu mutation). Cells werepassed in culture for no more than 6 months after being thawed fromoriginal stocks. Because cell lines were used only for analysis of shortregions of genomic DNA, authentication of lines by our laboratory waslimited to sequencing of those regions. To test the performance of thedeep sequencing method for a particular gene, DNA derived from cellsknown to be either mutant or wild-type with respect to that gene wasmixed in various ratios between 10,000:1 and 1:10,000. Cell line DNAsamples were then amplified and sequenced according to the same methodsthat were used for plasma DNA.

Ultra-Deep Sequencing

Barcoded PCR products from all samples were mixed to produce 3 separatepools, each corresponding to one set of replicate reactions. Uniquelyindexed TruSeq® adapters (Illumina, Inc., San Diego, Calif.) wereligated to each of the 3 pools of PCR amplicons using a modified versionof the manufacturer's protocol. Amplicon pools were purified byphenol-chloroform-isoamyl alcohol (PCA, Sigma-Aldrich Co., St. Louis,Mo.) extraction followed by ethanol precipitation. Addition ofdeoxyadenosine to the 3′-ends of the blunt-ended amplicons was performedaccording to Illumina's recommendations. PCA extraction and ethanolprecipitation were again used for purification. TruSeq adapters wereligated and the products were purified on a 2% agarose gel according tothe standard protocol. DNA concentration was estimated using aBioanalyzer 2100 (Agilent Technologies, Santa Clara, Calif.). Withoutfurther amplification, the 3 pools were combined and loaded onto asingle lane of an Illumina HiSeq® 2000 instrument. Prior to loading, thesamples were diluted by adding between 2- and 8-fold excess Phi-X DNA toimprove cluster discrimination. Sequencing was carried out inmultiplexed, 75 base pair, paired-end mode at the Yale Center forGenomic Analysis.

Data Analysis

A computer script was written to filter, assort, align, and countmillions of paired-end sequences. First, a read-pair was assigned to adata bin based on the barcode of each read in the pair. Then, based onPCR primer sequences, the pair was assigned to one of the referencegenes. Next, the longest stretch of perfect sequence agreement betweeneach pair of reads was determined, and this was used to align the readsto the reference sequence for the gene. A read pair was discarded ifeither member did not pass Illumina filtering or a nucleotide wasreported to be “.”; if there was an inconsistency in barcodes, strands,or PCR tags; or if their region of perfect sequence agreement was lessthan 36 nucleotides in length. Finally, variant sequences confirmed byreads from both strands were identified and counted within each data binbased on comparison to the reference sequence. A module used to performsequence alignments using a Smith-Waterman algorithm was taken, withpermission, from Dr. Conrad Huang, Resource for Biocomputing,Visualization & Informatics, University of California, San Francisco. Amodule used to determine the longest common substring was taken from aweb resource.

Confirmation of Mutations in Tumor Tissue

Genomic DNA was isolated from paraffin-embedded tumor tissue samplesusing the QuickExtract™ FFPE DNA Extraction Kit (EpicentreBiotechnologies, Madison, Wis.). Mutation hotspot regions of KRAS, BRAF,and EGFR were amplified using the same PCR primers that were used in thefirst round of PCR described above. Sanger sequencing was performed ongel-purified amplicons, and mutations were identified from chromatogramsusing Mutation Surveyor software (SoftGenetics LLC, State College, Pa.).

Determining the Absolute Concentration of Mutant DNA in Plasma.

Real-time quantitative PCR was used to measure the concentration of KRASDNA fragments in each patient's plasma sample. This value was multipliedby the fraction of mutant molecules as determined by deep sequencing inorder to calculate the absolute mutant KRAS DNA concentration. PCRconditions were the same as those used in the first round ofamplification described above except for the use of a single pair ofprimers (Round 1 KRAS Fwd and Rev) at 200 nM final concentration, andthe addition of SYBR® Green dye (Stratagene, La Jolla, Calif.) at1:60,000 final dilution. Amplification was carried out using an IQ5Real-time PCR Detection System with version 2.1 software (Bio-RadLaboratories, Hercules, Calif.). To enable determination of absolutecopy numbers, a standard curve was generated using known concentrationsof a cartridge-purified oligonucleotide that was designed to mimic thefragment of KRAS DNA being amplified from plasma. The sequence of theoligonucleotide was:5′-AAGGCCTGCTGAAAATGACTGAATATAAACTTGTGGTAGATGGAGCTGGTGGCGTA AGCAAGAGTGCCTTGACGATACAGCTAATTCAGAATCATTTTGTGGACGAATA-3′ (SEQ ID No: 51).Real-time PCRs were performed in triplicate, and the KRAS DNAconcentration was determined using the mean of the 3 measurements.

Results Error Suppression Reveals Low-Abundance Variants

To determine the relative abundance of tumor-specific mutations,massively parallel sequencing was performed on PCR amplicons derivedfrom plasma DNA fragments containing known mutation hotspots. Thousandsof clonal sequence reads from each plasma sample were compared toreference sequences in order to identify and quantify variants. Forproof of concept, analysis was restricted to frequently mutated codonswithin 3 oncogenes that commonly develop somatic mutations in variousmalignancies: codons 12 and 13 of KRAS, codon 600 of BRAF, and codon 858of EGFR. By designing PCR primers that flank very short regions (<50 bp)surrounding these mutation hotspots, adequate amplification of highlyfragmented plasma DNA could be ensured and greater sequence depth couldbe achieved. Modular attachment of DNA barcode tags to the 5′-ends ofthe PCR primers allowed sequencing of up to 256 DNA samples in batch(FIG. 4A and FIG. 9). A median depth of 108,467 read pairs was obtainedper mutation site per sample after filtering and de-multiplexing a totalof 86,359,980 raw sequences generated on a single lane of an IlluminaHiSeq® 2000 flow cell.

Importantly, the design of short PCR amplicons enabled us to devise asequencing strategy that could distinguish mutant from wild-type DNAmolecules with very high confidence. Illumina's paired-end sequencingmode was modified to achieve partial overlap of 75 base-pairbidirectional reads obtained sequentially from the forward and reversestrands of each clonal DNA cluster on the flow cell (FIG. 4B). Mutationhotspots were included in the overlapping sequence region so that thehotspot within each clone would be read from one strand and thenproofread from the opposite strand. By discarding clones that did nothave perfect sequence agreement between the two paired-end reads, thevast majority of sequencer-generated errors were eliminated. Imperfectsequence agreement was found in 22% of read pairs that had alreadypassed Illumina's chastity filter. A median error frequency of 0.31% perbase was observed when directly comparing single reads derived fromeither strand of wild-type control samples to known reference sequences.The frequency of such errors was reduced to 0.07% per base in the regionof overlap after removing read pairs that lacked sequence consistency.

Any remaining errors were highly unlikely to be caused by coincidentallyconsistent misreads from opposite ends of a clone. Rather, most of theseerrors were probably present within the DNA molecules being sequenced,introduced by polymerase misincorporations or DNA damage. To furtherdiscriminate true mutations from such errors, all amplification andprocessing steps were performed in triplicate, and the mean of the threemutation counts was determined. This was done based on the premise thattrue mutations would be reproducibly counted in all three instances,whereas counts from randomly occurring errors would be more variable(recognizing that the distribution of errors is not entirely random).Using this approach, the frequency of miscalls of specific mutationsfrom known wild-type samples was reduced to a median value of 0.014%(interquartile range [IQR]: 0.0052% to 0.023%; Table 3). Suppression oferrors in this manner permitted rare mutations to be identified with ahigh degree of certainty (FIG. 5).

TABLE 3 Background level of spurious mutation counts obtained from knownwild-type samples. Fraction of mutant: wild-type Mutation Type counts(from 3 replicate PCRs) BRAF Val600Glu 0.00013 EGFR Leu858Arg 0.000047KRAS Gly12Ser 0.00029 KRAS Gly12Val 0.00018 KRAS Gly12Arg 0.000057 KRASGly12Asp 0.00015 KRAS Gly12Ala 0.000014 KRAS Gly12Cys 0.00025 KRASGly13Ser 0.00015 KRAS Gly13Val 0.00029 KRAS Gly13Arg 0.000050 KRASGly13Asp 0.000044 KRAS Gly13Ala 0.000058 KRAS Gly13Cys 0.00049 MedianValue 0.00014 Interquartile range 0.000052 to 0.00023

Sensitive and Accurate Quantitation of Mutant DNA

Next, mutant and wild-type DNA levels were measured over a broad rangeof relative concentrations. Genomic DNA from KRAS-, BRAF-, orEGFR-mutant cancer cell lines was mixed in different ratios, and thensubjected to amplification and deep sequencing. Mutant DNA could beaccurately and reproducibly measured in a linear manner overapproximately 8 orders of magnitude and down to levels as low as 1 in10,000 molecules (FIG. 6). Also, by testing combinations of DNA frommultiple mutant cell lines, the assay was able to simultaneouslyquantify more than one mutation from a given sample.

Monitoring ctDNA Levels in Cancer Patients

To compare with clinical samples, plasma collected from patients withnon-small cell lung cancer (NSCLC) at various times before, during, orafter treatment was analyzed. Patients were enrolled in the study (andtheir plasma DNA was tested) without prior knowledge of the mutationstatus of their tumors. A total of 117 samples were obtained from 30patients (17 patients with adenocarcinoma, 9 with undifferentiatedNSCLC, and 4 with squamous cell carcinoma). KRAS Gly12Asp, Gly12Val,Gly12Cys, or Gly13Asp point-mutations were detectable in the plasma DNAof 6 patients out of 26 with adenocarcinoma or undifferentiated NSCLC.As expected, no KRAS mutations were found in specimens from patientswith squamous cell carcinoma. BRAF and EGFR mutations were notdetectable in any plasma samples. This was somewhat surprising for EGFR,which has a reported prevalence of activating mutations in NSCLC ofapproximately 10% (Lynch et al., N Engl J Med. 2004; 350: 2129-2139;Paez et al., Science. 2004; 304: 1497-1500; Pao et al., Proc. Natl.Acad. Sci. USA. 2004; 101: 13306-13311). However, evaluation of 21available tumor tissue specimens confirmed the absence of EGFR mutationsin this population (mutations occurring outside of the sequenced hotspotregion may have been missed). The presence or absence of KRAS mutationsin all tested tumor samples was tested to be concordant with thefindings in plasma: 5 patients had identical KRAS mutations in bothtumor and plasma, and 16 patients had no KRAS mutations detected fromeither source. Tumor tissue was unavailable or insufficient for 1patient with mutant KRAS in the plasma, and for 4 patients with noplasma mutations. Table 4 lists the clinical characteristics andmutation findings for all enrolled patients.

TABLE 4 Clinical characteristics and mutation findings. Tumor TissuePlasma Samples Method of Patient Mutation No. of NSCLC Tissue TissueMutation No. Sex Age Stage Type* Samples Histology Source MutationTesting 1 M 82 IV KRAS 2 Adeno Cells in KRAS Clinical WT pleural WT labEGFR fluid EGFR WT WT BRAF WT 2 M 68 IV KRAS 2 Adeno Lung Tissue not WTcore available from EGFR Bx outside hospital WT BRAF WT 3 F 51 IV KRAS 3Adeno Tracheal KRAS Sanger Gly12Asp Bx Gly12Asp seq. EGFR EGFR WT WTBRAF BRAF WT WT 4 M 71 IIIB KRAS 12 Squam Para- Not tested WT tracheal(squamous EGFR lymph histology) WT node BRAF Bx WT 5 M 44 IV KRAS 5Adeno Lung KRAS Sanger Gly12Val core Gly12Val seq. EGFR Bx EGFR WT WTBRAF BRAF WT WT 6 M 68 Lung: KRAS 13 Adeno Lung Excess tissue not IA WTcore available Prost.: EGFR Bx II WT Esoph.: BRAF III WT 7 M 59 IIIAKRAS 8 Squam Lung Not tested WT core (squamous EGFR Bx histology) WTBRAF WT 8 F 70 IIIA KRAS 3 Adeno Lung KRAS Sanger WT core WT seq. EGFRBx EGFR WT WT BRAF BRAF WT WT 9 M 72 IIIB KRAS 4 Undiff Bronchial KRASSanger Gly12Val brushing Gly12Val seq. EGFR EGFR WT WT BRAF BRAF WT WT10 F 62 Lung: KRAS 1 Lung Iliac KRAS Sanger IV WT Adeno wing WT seq.Breast: EGFR and core EGFR I WT Breast Bx WT BRAF Adeno BRAF WT WT 11 F79 IV KRAS 1 Adeno Lung KRAS Sanger Gly12Val fine Gly12Val seq. EGFRneedle EGFR WT aspirate WT BRAF BRAF WT WT 12 M 69 IV KRAS 3 AdenoScapula KRAS Clinical WT mass WT lab EGFR Bx EGFR WT WT BRAF WT 13 F 61Lung: KRAS 3 Undiff Pre- KRAS Clinical IV WT tracheal WT lab Breast:EGFR lymph EGFR I WT node WT BRAF needle WT aspirate 14 M 77 IV KRAS 2Adeno Calf mass KRAS Sanger Gly12Cys excision Gly12Cys seq. EGFR EGFR WTWT BRAF BRAF WT WT 15 M 65 IV KRAS 2 Undiff Bronchial Excess tissue notGly13Asp brushing available EGFR WT BRAF WT 16 F 73 IV KRAS 3 UndiffBronchial KRAS Sanger WT Bx WT seq. EGFR EGFR WT WT BRAF BRAF WT WT 17 F65 IA KRAS 5 Adeno Lung KRAS Sanger WT core WT seq. EGFR Bx EGFR WT WTBRAF BRAF WT WT 18 F 77 IV KRAS 1 Adeno Lung Excess tissue not WT coreavailable EGFR Bx WT BRAF WT 19 F 75 IV KRAS 2 Adeno Bronchial KRASClinical WT Bx WT lab EGFR EGFR WT WT BRAF WT 20 M 73 IB KRAS 5 SquamLung Not tested WT lobec- (squamous EGFR tomy histology) WT BRAF WT 21 M73 IIB KRAS 4 Adeno Lung KRAS Sanger WT core WT seq. EGFR Bx EGFR WT WTBRAF BRAF WT WT 22 F 68 IV KRAS 3 Undiff Lung KRAS Clinical WT tumor WTlab EGFR excision EGFR WT WT BRAF WT 23 F 79 IA KRAS 3 Undiff LungTissue not WT core available from EGFR Bx outside hospital WT BRAF WT 24M 64 IIIB KRAS 8 Squam Lung Not tested WT core (squamous EGFR Bxhistology) WT BRAF WT 25 F 73 Locally KRAS 8 Undiff Lung KRAS Sangerrecur. WT lobec- WT seq. IB EGFR tomy EGFR WT WT BRAF BRAF WT WT 26 F 63IIIB KRAS 1 Adeno Lung KRAS Sanger WT core WT seq. EGFR Bx EGFR WT WTBRAF BRAF WT WT 27 F 74 IIIA KRAS 4 Undiff Para- KRAS Sanger WT trachealWT seq. EGFR lymph EGFR WT node WT BRAF Bx BRAF WT WT 28 F 61 IV KRAS 3Adeno Spine KRAS Sanger WT Met Bx WT seq. and EGFR EGFR Clinical WT WTlab BRAF BRAF WT WT 29 F 82 IV KRAS 2 Undiff Lung KRAS Sanger WT core WTseq. and EGFR Bx EGFR Clinical WT WT lab BRAF BRAF WT WT 30 F 69 Lung:KRAS 1 Adeno Lung KRAS Sanger IV WT fine WT seq. Breast: EGFR needleEGFR IV WT aspirate WT BRAF BRAF WT WT The list is ordered by date offirst specimen collection. *Plasma DNA was only tested for mutations atcodons 12 and 13 of KRAS, 858 of EGFR, and 600 of BRAF. Squam = Squamouscell carcinoma Adeno = Adenocarcinoma Undiff = Undifferentiated NSCLC(not otherwise specified) WT = Wild-type Bx = Biopsy Sanger Seq. =Direct Sanger sequencing of tissue-derived PCR amplicons by ourlaboratory. Clinical lab = Mutations tested for clinical purposes in alaboratory certified under the Clinical Laboratory ImprovementAmendments of 1988 (CLIA). Tissue was not tested for BRAF mutations byclinical laboratories because of low prevalence in NSCLC.

For patients with detectable plasma DNA mutations, changes in measuredctDNA levels were followed in the context of therapeutic interventionsor disease progression. To determine the absolute concentration ofmutant KRAS DNA fragments in a plasma sample, the total concentration ofKRAS fragments was measured by real-time PCR and then multiplied by thefraction of mutant molecules determined by deep sequencing. The medianconcentration among samples with detectable mutations was 5,694 mutantKRAS molecules per mL (IQR: 2,655 to 25,123). Time-courses of mutantctDNA measurements for patients who had 3 or more samples collected areshown in FIG. 7 (data for patients with fewer measurements are shown inFIG. 8). In two cases, the ctDNA level decreased upon treatment withradiation and/or systemic therapy. Aggressive progression of metastaticdisease in a different patient was accompanied by a substantial rise inctDNA. In another two cases, ctDNA levels increased shortly afterinitiating treatment, perhaps because more tumor DNA was released intothe bloodstream as cancer cells were being killed.

Example 2

This example includes methods that incorporate elements of Example 1,but also includes several modifications. (FIGS. 10 and 11). In thisexample, 40 different genomic target regions were analyzed. Of the 40genomic target regions, 38 were prone to developing somatic mutations,and 2 were included as controls that were not expected to be mutated.

Preparation of Mixtures of Primers Having Combinations of ModularOligonucleotide Segments

As described previously, early tagging of targeted DNA templatemolecules required the production of mixtures of primers having a commonbarcode in their 5′ region, and having several different gene-specificprimer segments at their 3′ end. Herein modular oligonucleotide segmentswere combined during oligonucleotide synthesis on an automatedsynthesizer, “modular automated synthesis and purification”, and theapproach is illustrated in (FIG. 3).

Each different gene-specific 3′-portion was synthesized on separateoligonucleotide synthesis columns. Standard phosphoramidite chemistrywas used, and the oligonucleotides were grown on a solid support. Bothpolystyrene and controlled-pore-glass were used as solid supports, butpolystyrene was preferable. Both types of supports performed similarly.The solid support consisted of small particles that appeared as apowder. The powder was contained within an oligonucleotide synthesiscolumn, sandwiched loosely between two fits. Multiple different3′-segments were grown (oligomerized by chemical coupling ofphosphoramidite monomers) in separate synthesis columns on an automatedsynthesizer in the 3′ to 5′ direction. The synthesis was paused, andpartially synthesized oligonucleotides were left on the column in theprotected state with the trityl group left on.

“Pipette tip”-style oligonucleotide synthesis columns were utilized withsufficient controlled-pore glass (1000 angstrom pore size) orpolystyrene to synthesize oligos at the 40 nanomole or 200 nanomolescale (3-Prime, Aston, Pa.). Forty different partial 3′ oligonucleotidesegments were synthesized on 40 separate columns using a Dr. Oligo 192automated synthesizer. The oligonucleotides were not cleaved from thesolid supports, were not deprotected, and the trityl group was left onso that further synthesis could be continued. The sequences of these 40different 3′ segments are listed in Table 5.

TABLE 5 List of forty 3′oligonucleotide segments synthesized in separatecolumns for first phase of modular automated synthesis. SEQ NameDNA Sequence ID NO: 3′ segment1 AGACGTGTGCTCTTCCGATCTNNNNNNCTGTGCTGTGACT52 GCTTG 3′ segment2 AGACGTGTGCTCTTCCGATCTNNNNNNTAGCACATGACGG 53 AGGTT3′ segment3 AGACGTGTGCTCTTCCGATCTNNNNNNACAAATACTCCAC 54 ACGCAAATT 3′segment4 AGACGTGTGCTCTTCCGATCTNNNNNNATATTTGGATGAC 55 AGAAACACTT 3′segment5 AGACGTGTGCTCTTCCGATCTNNNNNNCTGTGATGATGGT 56 GAGGATGG 3′segment6 AGACGTGTGCTCTTCCGATCTNNNNNNCTGGGACGGAACA 57 GCTTTGAG 3′segment7 AGACGTGTGCTCTTCCGATCTNNNNNNTGCAATTTCTACA 58 CGAGATCCTCT 3′segment8 AGACGTGTGCTCTTCCGATCTNNNNNNTCTTTGGAGTATTT 59 CATGAAACAAATGA 3′segment9 AGACGTGTGCTCTTCCGATCTNNNNNNAACAGTAAAAATA 60 GGTGATTTTGGTCTA 3′segment10 AGACGTGTGCTCTTCCGATCTNNNNNNTGCAACTACTGGA 61 CGCTGGAC 3′segment11 AGACGTGTGCTCTTCCGATCTNNNNNNCTCAATTTTGTTTC 62 AGGACCTGCT 3′segment12 AGACGTGTGCTCTTCCGATCTNNNNNNCTGGCAGCAACAG 63 TCTTACCT 3′segment13 AGACGTGTGCTCTTCCGATCTNNNNNNACCCAGCTTG 64 GAGGCTGC 3′ segment14AGACGTGTGCTCTTCCGATCTNNNNNNAGCCAGGCCGCTG 65 AAGACA 3′ segment15AGACGTGTGCTCTTCCGATCTNNNNNNGGCAATTCACTGT 66 AAAGCTGGAAAG 3′ segment16AGACGTGTGCTCTTCCGATCTNNNNNNATGAAGATATATT 67 CCTCCAATTCAGGAC 3′ segment17AGACGTGTGCTCTTCCGATCTNNNNNNGCGTTTCCTTTAA 68 CCACATAATTAGAATC 3′segment18 AGACGTGTGCTCTTCCGATCTNNNNNNGTTTTCCCTTTCTC 69 CCCACAG 3′segment19 AGACGTGTGCTCTTCCGATCTNNNNNNGTTCCTGTAGCAA 70 AACCAGAAATC 3′segment20 AGACGTGTGCTCTTCCGATCTNNNNNNCGGTGAGAAAGTT 71 AAAATTCCCGTC 3′segment21 AGACGTGTGCTCTTCCGATCTNNNNNNAAGCATGTCAAGA 72 TCACAGATTTTG 3′segment22 AGACGTGTGCTCTTCCGATCTNNNNNNCTCACCTCCACCG 73 TGCAGCT 3′segment23 AGACGTGTGCTCTTCCGATCTNNNNNNGACCACCCGCACG 74 TCTGT 3′ segment24AGACGTGTGCTCTTCCGATCTNNNNNNTCTTCCATACTTG 75 ATTCATGATATTTTACT 3′segment25 AGACGTGTGCTCTTCCGATCTNNNNNNGACCTCCTCAAAC 76 AGCTCAAAC 3′segment26 AGACGTGTGCTCTTCCGATCTNNNNNNATGGGAGATCTTC 77 ACGCTGG 3′segment27 AGACGTGTGCTCTTCCGATCTNNNNNNTCCCTGAGCGTCA 78 TCTGCC 3′segment28 AGACGTGTGCTCTTCCGATCTNNNNNNCGCTGGTGGAGGC 79 TGACGA 3′segment29 AGACGTGTGCTCTTCCGATCTNNNNNNGTTCCCTATCAAA 80 TATGTCAACGACT 3′segment30 AGACGTGTGCTCTTCCGATCTNNNNNNAATTTTGGTCTTG 81 CCAGAGACA 3′segment31 AGACGTGTGCTCTTCCGATCTNNNNNNTATCGACTCCACC 82 GAGGTCA 3′segment32 AGACGTGTGCTCTTCCGATCTNNNNNNATACTTGGAGGAC 83 CTGCACG 3′segment33 AGACGTGTGCTCTTCCGATCTNNNNNNGTCGTCAAGGCAC 84 TCTTGCCT 3′segment34 AGACGTGTGCTCTTCCGATCTNNNNNNCGATATTCTCGAC 85 ACAGCAGGT 3′segment35 AGACGTGTGCTCTTCCGATCTNNNNNNATCAGTGCGCTTT 86 TCCCA 3′ segment36AGACGTGTGCTCTTCCGATCTNNNNNNTGACATACTGGAT 87 ACAGCTGGA 3′ segment37AGACGTGTGCTCTTCCGATCTNNNNNNGTGGTCAGCGCAC 88 TCTTGCCC 3′ segment38AGACGTGTGCTCTTCCGATCTNNNNNNTCATCCTGGATAC 89 CGCCGGC 3′ segment39AGACGTGTGCTCTTCCGATCTNNNNNNATCCTGTTTATAA 90 TATTGACAAAACACCT 3′segment40 AGACGTGTGCTCTTCCGATCTNNNNNNATCAGGACAAAGT 91 CCGGATTGAThese oligonucleotides were synthesized at the 200 nanomole scale, withthe oligo left on the column in the protected state with the tritylgroup left on. Positions marked “N” have equal probability of being A,C, G, or T.

The solid supports of all 40 partially synthesized oligonucleotides weredried by blowing argon gas through the columns, and then thecontrolled-pore glass or polystyrene powder from all 40 columns wasmixed by pouring the contents of each column (after cutting the tops offof the columns) into a common container (such as a glass vial). Thesolid support particles were then suspended in a solvent of similardensity so that the particles could be thoroughly mixed and then themixture could be dispensed into fresh oligonucleotide synthesis columns.When using polystyrene supports, a 3:1 mixture ofdichloromethane:acetonitrile was used as the suspension liquid, and whenusing controlled-pore glass supports, a 5:1 mixture of1,2-dibromoethane:acetonitrile was used as the suspension liquid. Theparticles were maintained as a uniform slurry in the liquid byconstantly swirling or agitating the vial while using a pipette todispense equal volumes of the slurry into fresh columns (with the bottomfrit already in place). The slurry was dispensed into 96 fresh columns.The particles settled onto the frits, while the liquid drained out fromthe bottom of the columns by gravity. To ensure that the particles hadall settled onto the frit, the columns were filled with acetonitrile andthis was again allowed to drain out from the bottom by gravity. Afterthe acetonitrile had fully drained out, the top frits were put in placeto secure the powder into the columns.

The new columns were then placed back on the automated synthesizer, andthe oligonucleotide synthesis was continued. Each column was assigned adifferent barcode sequence that was incorporated into the 5′oligonucleotide segment. A “dummy base” was added to the 3′ end of the5′ segment sequence when programming the synthesizer in order to accountfor the partially synthesized oligonucleotides that were already presenton the solid supports. The sequences of the 96 different 5′ segmentsconsisted of the following common sequence with each of 96 differentbarcodes inserted in the position marked [BC1-96]. One unique barcodewas used per oligonucleotide synthesis column.

(SEQ ID NO: 92) 5′-CGAGACGGATCAAGCAGAAGACGGCATACGAGATNN[BC1-96]GTGACTGGAGTTC(T)-3′The “T” in parentheses at the 3′-end of the sequence is the “dummybase”. The 96 barcodes that were used are listed in Table 6. Theautomated synthesizer was programmed to carry out synthesis at a 40nmole scale (which determines the volume of reagents passed through thecolumns), although the actual amount of solid support in each column waslikely to produce less than 40 nmoles of oligonucleotides.

TABLE 6 List of 96 sample-specific barcodes Sequence Barcode #(BBBBBBBB)  1 CCGATATT  2 GCCATATT  3 TCTGGATT  4 ACTCGATT  5 TCACGATT 6 ACTGCATT  7 TCAGCATT  8 TCTCCATT  9 ATTGGTGT 10 TTAGGTGT 11 ATACGTGT12 ATAGCTGT 13 ATTCCTGT 14 TTACCTGT 15 CTCTATGT 16 CTCTTAGT 17 CTGATAGT18 GTCATAGT 19 ATAGGAGT 20 ATTCGAGT 21 TTACGAGT 22 ATTGCAGT 23 TTAGCAGT24 ATACCAGT 25 AGTGGTCT 26 TGAGGTCT 27 TGTCGTCT 28 TGTGCTCT 29 AGTCCTCT30 TGACCTCT 31 CGCTATCT 32 CGCTTACT 33 CGGATACT 34 GGCATACT 35 TGTGGACT36 AGTCGACT 37 TGACGACT 38 AGTGCACT 39 TGAGCACT 40 TGTCCACT 41 TCATTGTG42 TCTATGTG 43 ACTATCTG 44 ACTTACTG 45 TCATACTG 46 ATTATCGG 47 TTATACGG48 TGATTGCG 49 TGTATGCG 50 AGTATCCG 51 AGTTACCG 52 TGATACCG 53 ACTATGTC54 ACTTAGTC 55 TCATAGTC 56 TCATTCTC 57 TCTATCTC 58 ATTATGGC 59 TTATAGGC60 TTATTCGC 61 AGTATGCC 62 AGTTAGCC 63 TGATAGCC 64 TCTGGTTA 65 TCACGTTA66 TCAGCTTA 67 TCTCCTTA 68 CCGTATTA 69 GCCTATTA 70 CCGTTATA 71 GCCTTATA72 TCAGGATA 73 TCTCGATA 74 TCTGCATA 75 TCACCATA 76 CTGATTGA 77 GTCATTGA78 TTACGTGA 79 TTAGCTGA 80 CTGTATGA 81 GTCTATGA 82 CGGATTCA 83 GGCATTCA84 TGTGGTCA 85 TGACGTCA 86 TGAGCTCA 87 TGTCCTCA 88 CGGTATCA 89 GGCTATCA90 CGGTTACA 91 GGCTTACA 92 CGCATACA 93 TGAGGACA 94 TGTCGACA 95 TGTGCACA96 TGACCACA

After completion of the second phase of the modular synthesis, theoligos were cleaved off the solid supports with the trityl group stillleft on. They underwent rapid deprotection followed by purification on aseparate Glen-Pak DNA reverse-phase cartridge for each of the 96oligonucleotide mixtures (Glen Research, Sterling, Va.). The tritylgroup at the 5′-end of completed oligonucleotides was selectivelyretained by the cartridge, enriching for full-length products andremoved failure sequences that did not contain the trityl group. Thetrityl group was removed upon completion of purification. The purifiedoligonucleotides were then dried and re-suspended in 10 mM Tris pH 7.6to produce a 33 micromolar working stock solution. Polyacrylamide gelpurification was used in some cases to further purify the full-lengtholigonucleotides.

Collection and Processing of Patient Plasma Samples

Blood was collected by venipuncture into a vacuum tube containingpotassium-EDTA. Various tube sizes were used, typically between 3 mL and10 mL. Blood was inverted in the tube several times at the time ofcollection to ensure even mixing of the K₂-EDTA. Samples were storedtemporarily and transported at room temperature (20-25° C.) prior toseparation of plasma. Plasma was separated and frozen as soon aspossible after blood collection, preferably within 3 or 4 hours. Thecollection tubes were centrifuged at 1000×g for 10 minutes in a clinicalcentrifuge with a swinging bucket rotor with slow acceleration anddeceleration (brake off). Plasma was removed from the red blood cellsand buffy coat using a 1 mL pipette, being careful not to disturb thecells at the bottom of the tube (to avoid aspirating white blood cellswhich would lead to increased background wild-type DNA levels). Theplasma was dispensed into 1.5 mL cryovials in 0.5 to 1 mL aliquots. Theplasma was then frozen at −80° C. until needed for further processing.

Extraction and Purification of DNA from Plasma

Plasma was removed from the −80° C. freezer and was thawed at roomtemperature for 15 to 30 minutes before proceeding with DNA extraction.Thawed plasma was then centrifuged at 6800×g for 3 minutes to remove anycryoprecipitate. The supernatant was transferred to a fresh tube forfurther processing.

The QiaAmp® DNA Blood Mini Kit (Qiagen) was used for purification fromplasma volumes up to 200 μL (elution volume of 50 μL), and the QiaAmp®MinElute® Virus Vacuum Kit (Qiagen) has also been used for plasmavolumes up to 1 mL (elution volume as low as 20 μL). For larger volumesof a particular sample of plasma, more than one column of the QiaAmp®MinElute® Virus Vacuum Kit was used for purification. All kits were usedaccording to the manufacturer's instructions, generally eluting the DNAinto the lowest recommended volume (preferably 20 μL). To process 1 mLof plasma using the QiaAmp® MinElute® Virus Vacuum Kit, 5 micrograms ofcarrier RNA (cRNA; Qiagen) were added per mL, and the user-developedprotocol found on the Qiagen website was followed.

Primer-Extension Reaction

Specific mutation-prone regions of purified, plasma-derived template DNAmolecules were copied using targeted gene-specific primers. The numberof different gene-specific primer sequences used in each tube dependedon the number of targeted DNA regions within the genome. A combinationof 40 different gene-specific primers were used in each sample to target40 different gene regions. As described previously, each set ofgene-specific primers had a unique, sample-specific DNA sequence (abarcode) near the 5′-end of the primers that were incorporated in amodular fashion. Each sample underwent primer-extension using anapproximately equimolar concentration of 40 different gene-specificprimers, all of which had the same sample-specific barcode. Theseprimers also included degenerate sequence regions known as molecularlineage tags (MLTs) as well as common sequences at the 5′-end thatallowed for hybridization of “universal” PCR primers in subsequentsteps.

Control DNA molecules containing known mutations were spiked into eachprimer extension reaction to serve as internal quantitative standards.These DNA molecules were cartridge-purified oligonucleotides that weresynthesized to contain variations from the wild-type sequence at twodistinct positions (which would be extremely unlikely to occur inplasma-derived DNA). These variations allowed the control sequences tobe readily distinguished from other variants within DNA purified from aclinical sample. The sequences of the top strands of these control DNAoligonucleotides are listed in Table 7. Reverse complements of these 40sequences were also separately synthesized to produce bottom strands. Inorder to make the control DNA as similar as possible to theclinically-derived DNA, both strands were annealed to make themdouble-stranded before adding them to the primer-extension reaction. Thedouble-stranded DNA was quantified by UV spectrometry and then dilutedto the desired concentration. To each primer-extension reaction,approximately 200 copies of the double-stranded control DNA fragmentscorresponding to each of the 40 gene target sites were added.

TABLE 7List of spiked-in quantitative standard oligonucleotides containingmutations at 2 distinct sites relative to wild-type. SEQ ID NameDNA Sequence NO: CFTP-1 TCAACAAGATGTTTTGCCAACTGGCCAAGACCTGCCC  93TGTGCAGCTGTGGGTTGATTCCACACCCCCGCCCGGCACCCGCGTCCGCGTCATGACCATCTACAAGCAGTCAC AGCACATGA CFTP-2TCACAGCACATGACGGAGGTTGTGAGGCGCTGCCACC  94ACCATGTGCGCTGCTCAGATAGCGATGGTGAGCAGCT GGGGCTGGAGAGACGACAGGGCTG CFTP-3ACAGGGCTGGTTGCCCAGGGTCCCCAGGCCTCTGATT  95CCTCACTGATTGCTCTTAGGTCTGGCCCCTCCTCAGCATCATATCCGAGTCGAAGGAAATTTGCGTGTGGAGTAT TTGGATG CFTP-4GAGTATTTGGATGACAGAAACACTTTTCGACACAGTG  96TGGTGATGCCCTATGAGCCGCCTGAGGTCTGGTTTGCAACTGGGGTCTCTGGGAGGAGGGGTTAAGGGTGGTT GTCAGTGGCCCTC CFTP-5CTGGCCTCATCTTGGGCCTGTGTTATCTCCTAGGTTGG  97CTCTGACTGTACCACCATCCACTACAACGACATGTGTAACTGTTCCTGCATGGGCGGCATGAACCGGAGGCCCA TCCTCACCATCATCACACTGG CFTP-6TACTGGGACGGAACAGCTTTGAGGTGCGTGTTTGTGC  98ATGTCCTGGGACAGACCGGCGCACAGAGGAAGAGAATCTCCGCAAGAAAGGGGAGCCTCACCACGAGCTGCC CCCAG CFPIK-1CAAAGCAATTTCTACACGAGATCCTCTCTCTGAAGTC  99AGTGAGCAGGAGAAAGATTTTCTATGGAGTCACAGG TAAGTGCTAAAATGGAGATTCTCTGTTTCTTTTTCCFPIK-2 GAGGCTTTGGAGTATTTCATGAAACAAATGAATCATA 100CACATCATGGTGGCTGGACAACAAAAATGGATTGGATCTTCCACACAATTAAACAGCATGCATTGAACTGAAA AG CFBRAFCCTCACAGTAAAAATAGGTGATTTTGGTCTAGCGACA 101GTGAAAGCTCGATGGAGTGGGTCCCATCAGTTTGAACAGTTGTCTGGATCCATTTTGTGGATGGTAAGAATT CFFoxAAGGGCAACTACTGGACGCTGGACCCGACCTGCGCA 102GACATGTTCGAGAAGGGCAACTACCGGCGCCGCCGC CGCATGAAGAGGCCCTTCCGGCCG CFGNASACCTCAATTTTGTTTCAGGACCTGCTTCACTGCCGTAT 103CCTGACTTCTGGAATCTTTGAGACCAAGTTCCAGGTG GACAAAGTCAACTTCCAGTAAGCCAACTCFCTNN CACTGGCAGCAACAGTCTTACCTGGACTCTGGAATCC 104ATTCTGATGCCACTACCACAGATCCTTCTCTGAGTGGTAAAGGCAATCCTGAGGAAGAGGATGTGGATACCTC CCAAGTCCTGTAT CFPPP-1CTGCCTGCTGCCTCAGGATCCCCGTCCCCGACTCCCA 105GGTACTTCCGGAACCTGTGCTCAGATGACACCCCCAC GGTGCGGCGGACCGCAGCCTCCAAGCTGGGGGAGCFPPP-2 CTGCGCCAGGCCGCTGAAGACAAGACCTGGCGCATC 106CGCTACATGGTGGCTGACAAGTTCACAGAGGTAGATG AGCGACCGTTGACATTGTCCCACTGGTCFPTEN-1 TGCAGCAATTCACTGTAAAGCTGGAAAGGGACGAAC 107AGGTGTAATGACATGTGCATATTTATTACATCGGGGCAAATTTTTAAAGGCACAAGAGGCCCTAGATTTCTATG GGGAAG CFPTEN-2AGGTGAAGATATATTCCTCCAATTCAGGACCCTCACG 108ACGGGTAGACAAGTTCATGTACTTTGAGTTCCCTCAGCCGTTACCTGTGTGTGGTGATATCAAAGTAGAGTTCT CFKIT-1GAGACTTGGCAGCCAGAAATATCCTCCTTACTCATGG 109TCGGATCACAAAGATTTGTGATTTTGGTCTAGCCATAGACATCACGAATGATTCTAATTATGTGGTTAAAGGAA ACGTGAG CFKIT-2TATTTTTCCCTTTCTCCCCACAGAAACCTATGTATGAA 110GTACAGTGGAAGGATGTTGAGGAGATAAATGGAAACAATTATGTTTACATAGACCCAACACAACTTCCTTATG ATCACAAATGGGAGTTTC CFKIT-3GTTTTCCTGTAGCAAAACCAGAAATCCTGACTTACGA 111CAGGCTAGTGAATGGCATGCTCCAATGTGTGGCAGCA GGATTCCCAGAGCCCACAATAGATTGGTATTTTTCFEG-1 AGAAGGTGAGAAAGTTAAAATTCCCGTCGCTATGAA 112GGAATTAAGAGAAGCAACATCTCCGTAAGCCAACAAGGAAATCCTCGATGTGAGTTTCTGCTTTGCTGTGTGG GGGTC CFEG-2CCGCAGCATGTCAAGATCACAGATTTTGGGCTGGACA 113AACAGCTGGGTGCGGAAGAGAAAGAATACCATGCAG AAGGAGGCAAAGTAAGGAGGTGGCTTTAGCFEG-3 GCCTCACCTCCACCGTGCAGCTCATGACGTAGCTCAT 114GCCCTTCGGCTGCCTCCTGGACTATGTCCGGGAACAC AAAGACAATA CFAKT1TCTCACCACCCGCACGTCTGTAGAGGACTACATCAAG 115ACCTGGCGGCCACGCTACTTCCTCCTCAAGAATGATG GCACCTTCATTGG CFATMTGTACTTCCATACTTGATTCATGATATTTTACTCCTAG 116ATACGAATGAATCATGGAGAAATCTGCTTTCTACACATGTTCAGGGATTTTTCACCAGCTGTCTTCGACACTTCT CGC CFAPCCACCACCTCCTCAAACAGCTCAAACCATGCGATAAGT 117ACCTAAAAATAAAGCACCTACTGCTGAAAAGAGAGA GAGTGGACCTAAGCAAGCTGCAGT CFFGFR-1GCTCTGGGAGATCTTCACGCTGGGGGACTCCCCGTAT 118CCCGGCATCCCTGTGGAGGAGCTCTTCAAGCTGCTGA AGGAGGGCCACCGCATGGACAAGCCCGCCACFFGFR-2 TGGCCCCTGAGCGTCATCTGCCCCCACTGAGCGCTCC 119ACGCACCGGCCCATCCTGCAGGCGGGGCTGCCGGCCAACCAGACGGCGGTGCTGGGCAGCGACGTGGAGTTC C CFFGFR-3AGGAGCTGGTGGAGGCTGACGAGGCGGGCAGTATGT 120ATACAGGCATCCTCAGCTACGGGGTGGGCTTCTTCCT GTTCATCCTGGTGGTGGCGGCTGTGACCFMET-1 GCATTCCCTATCAAATATGTCAACGACTTCATCAACA 121AGATAGTCAACAAAAACAATGTGAGATGTCTCCAGCATTTTTACGGACCCAATCATGAGCACTGCTTTAATAG GGTAA CFMET-2GCTGATTTTGGTCTTGCCAGAGACATGTATCATAAAC 122AATACTATAGTGTACACAACAAAACAGGTGCAAAGC TGCCAGTGAAGTGGATGGCTTTGGAAAGTCTGCFSTK-1 CCGCATCGACTCCACCGAGGTCATCTACCAGCCGAGC 123CGCATGCGGGCCAAGCTCATCGGCAAGTACCTGATGGGGGACCTGCTGGGGGAAGGCTCTTACGGCAAGGTGA AGGAGGTGCTGGACTCGGAG CFSTK-2CCGTACTTGGAGGACCTGCACGGCGCGGATGAGGAC 124GAGGACCACTTCGACATCGAGGATGACATCATCTACACTCAGGACTTCACGGTGCCCGGTGAGTCTGGCGGGGG CFKRAS-1TATAGTCACATTTTCATTATTTTTATTATAAGGCCTGC 125TGAAAATGACTGAATATAAACTTGTGGTAGTTGCAGA TGGTGGCGTAGGCAAGAGTGCCTTGACGATACCFKRAS-2 CTTGGATATTCTCGACACAGCAGGTCAAGACGAGTAC 126TGTGCAATGAGGGACCAGTACATGAGGACTGGGGAGGGCTTTCTTTGTGTATTTGCCATAAATAATACTAAA CFNRAS-1TGTAGATGTGGCTCGCCAATTAACCCTGATTACTGGT 124TTCCAACAGGTTCTTGCTGGTGTGAAATGACTGAGTACAAACTGGTCGTGGATGGAGCAGGTGGTGTTGGGAA AAGCGCACTGACAAT CFNRAS-2GTTGGACATACTGGATACAGCTGGACAAGAAGAGCA 128CAGTGACATGAGAGACCAATACATGAGGACAGGCGAAGGCTTCCTCTGTGTATTTGCCATCAATAATAGCAAG TCAT CFHRAS-1GGTGGGGCAGGAGACCCTGTAGGAGGACCCCGGGCC 129GCAGGCCCCTGAGGAGCGATGACGGAATATAAGCTGGTGGTCGTGGACGCCGGCGGTGTGGGCAAGAGTGCG CTGACCATCC CFHRAS-2TGGACATCCTGGATACCGCCGGCCAGGAGTACTACAG 130CGCCATGCGGGACCAGTACATGCGCACCGGGGAGGGCTTCCTGTGTGTGTTTGCCATCAACAACACCAAGTCTT TTGAGGA CFK-CtrlTGTTCCTGTTTATAATATTGACAAAACACCTTAGCGG 131ATGACATTTAAGAATTCTAAAAGTCCTAATATATGTAATATATATTCAGTTGCCTGAAGAGAAACATAAAGAAT CCTTTCTTAAT CFB-CtrlATGTCAGGACAAAGTCCGGATTGAATATAACTCTGCT 132TTATATTATAGGCCTATGAAGAATACACCAGCAAGCTAGATGCACTCCAACAAAGAGAACAACAGTTATTGGA ATCTCTGGG *All oligos synthesizedat 40 nmole scale with cartridge purification.

Conditions were optimized so that on average, more than one copy of eachoriginal DNA template molecule would be present at the beginning of thenext amplification step. Typically between 2 and 10 cycles ofprimer-extension were carried out. Primer extension was performed usingAccuprime Taq polymerase (Invitrogen) as described below.

Primer-Extension Reaction Setup (30 μL Reaction):

Purified template DNA (with co-eluted 20 μL (or less) carrier RNA[cRNA]) 100 copies of control mutant DNA in (as needed) 10 mM Tris (with300 ng per mL cRNA) 10 mM Tris with cRNA (300 ng per mL) (as needed forfinal 30 μL volume) 10 x concentrated Accuprime Buffer #2 3 μL Mix of 40modular barcoded primers 4.8 μL (50 μM total stock) (final ~200 nM each)Accuprime Taq polymerase 0.6 μL Total 30 μLTemperature Cycling Conditions (Carried Out on a BioRad iCycler®):

a. 94° C. for 120 sec b. 94° C. for 20 sec

c. 60° C. for 20 sec (this step provides more time for annealing)d. 55° C. for 1 min (may decrease this temperature to improve primerannealing)

e. 72° C. for 20 sec

f. repeat steps b-e for 3 more cycles (total 4 cycles)g. 4° C. for up to 20 minutesAs quickly as possible once the reactions had reached 4° C., 1 μL of 300mM EDTA was added (to make a final concentration of 10 mM) to terminatethe activity of the polymerase. Each tube was agitated gently to ensureeven mixing of the EDTA. Because the primer-extended molecules hadsample-specific barcodes attached, the products of all reactions thatwere derived from different samples could be pooled together into asingle tube.

Purification of Primer-Extended Products

The purification of primer-extended products was achieved via pull-downand elution steps using complementary biotinylated “capture”oligonucleotides and streptavidin-agarose beads (Thermo-Fisher). First,a mixture of complementary biotinylated oligonucleotides was added tothe pooled primer-extension products. These oligonucelotides weredesigned to anneal to the specific sequences that should be produced ifthe primers were extended using the intended genomic DNA target regionas their templates. A list of the 40 biotinylated oligos that were usedin the present example is included in Table 8. By capturing with thesebiotinylated oligos, it was possible to ensure that only thespecifically extended primers were isolated, and that any un-extendedprimers and any primers that were extended on non-specific DNA templateswere not pulled down. For every 30 microliter reaction volume (plus 1microliter of EDTA added), a final concentration of 200 nM of eachbiotinylated oligo was added (by addition of 3.5 μL of an 80 micromolaroligonucleotide mix for a final total concentration of 8 micromolarbiotinylated oligos [all 40 oligos]). Annealing of the biotinlyatedcapture oligos with the primer-extended products was achieved by heatingthe mixture to 95° C. for 30 seconds, then to 70° C. for 20 seconds,then cooling by 2.5° C. every 20 seconds until the mixtures reached 25°C.

TABLE 8 List of biotinylated target-specific capture DNAoligonucleotides SEQ ID NAME DNA OLIGO SEQUENCE NO: BNTP53-15′-BIO-CAAGATGTTTTGCCAACTGGCC 133 BNTP53-2 5′-BIO-CCTGTCGTCTCTCCAGCCCCAG134 BNTP53-3 5′-BIO-GGCTGGTTGCCCAGGGTCCC 135 BNTP53-45′-BIO-GCCACTGACAACCACCCTTAACC 136 BNTP53-55′-BIO-CCTCATCTTGGGCCTGTGTTATCT 137 BNTP53-6 5′-BIO-GGGCAGCTCGTGGTGAGGC138 BNPIK-1 5′-BIO- 139 AAGAAACAGAGAATCTCCATTTTAGCAC BNPIK-25′-BIO-TCAGTTCAATGCATGCTGTTTAATTGTG 140 BNBRAF5′-BIO-CTTACCATCCACAAAATGGATCCAGAC 141 BNFoxL25′-BIO-CGGAAGGGCCTCTTCATGCGGC 142 BNGNAS5′-BIO-GGCTTACTGGAAGTTGACTTTGTCCAC 143 BNCTNN5′-BIO-AGGACTTGGGAGGTATCCACATCC 144 BNPPP-15′-BIO-CTGCTGCCTCAGGATCCCCGTCC 145 BNPPP-25′-BIO-GTGGGACAATGTCAACGGTCGCT 146 BNPTEN-15′-BIO-CCCATAGAAATCTAGGGCCTCT 147 BNPTEN-25′-BIO-CTCTACTTTGATATCACCACACACAGG 148 BNKIT-1 5′-BIO- 149CTTGGCAGCCAGAAATATCCTCCTTACTC BNKIT-2 5′-BIO-CTCCCATTTGTGATCATAAGGAAGTTG150 BNKIT-3 5′-BIO-ATACCAATCTATTGTGGGCTCTGG 151 BNEG-15′-BIO-CCCACACAGCAAAGCAGAAAC 152 BNEG-2 5′-BIO-AGCCACCTCCTTACTTTGCCTCC153 BNEG-3 5′-BIO-GTCTTTGTGTTCCCGGACATAGTCC 154 BNAKT15′-BIO-TGAAGGTGCCATCATTCTTGAGGAG 155 BNATM5′-BIO-GAAGTGTCGAAGACAGCTGGTGAA 156 BNAPC5′-BIO-CAGCTTGCTTAGGTCCACTCTCTC 157 BNFGFR-15′-BIO-GGGCTTGTCCATGCGGTGGCC 158 BNFGFR-2 5′-BIO-CTCCACGTCGCTGCCCAGCACC159 BNFGFR-3 5′-BIO-CAGCCGCCACCACCAGGATGAAC 160 BNMET-15′-BIO-CCTATTAAAGCAGTGCTCATGATTGG 161 BNMET-25′-BIO-CTTTCCAAAGCCATCCACTTCAC 162 BNSTK-15′-BIO-GAGTCCAGCACCTCCTTCACCTTG 163 BNSTK-2 5′-BIO-CGCCAGACTCACCGGGCACC164 BNKRAS-1 5′-BIO- 165 GTCACATTTTCATTATTTTTATTATAAGGCCTGC BNKRAS-25′-BIO- 166 GTATTATTTATGGCAAATACACAAAGAAAGC BNNRAS-15′-BIO-GATGTGGCTCGCCAATTAACCCTGA 167 BNNRAS-25′-BIO-CTTGCTATTATTGATGGCAAATACACAG 168 BNHRAS-15′-BIO-GGGCAGGAGACCCTGTAGGAG 169 BNHRAS-25′-BIO-CAAAAGACTTGGTGTTGTTGATGGCA 170 BNK-Ctrl 5′-BIO- 171AGAAAGGATTCTTTATGTTTCTCTTCAGG BNB-Ctrl5′-BIO-GAGATTCCAATAACTGTTGTTCTCTTTGT 172 5′-Bio = 5′-Biotin

Then, 7 μL of high capacity streptavidin-agarose bead slurry(Thermo-Fisher) was added (per 30 μL primer-extension reaction). Tubeswere turned end-over-end constantly for at least 2 hours to promotebinding of biotinylated oligos to the streptavidin beads. Beads werethen centrifuged briefly, and any unbound supernatant was carefullyremoved, avoiding aspiration of any beads. The beads were then washed inabout 200 μL of 10 mM Tris pH 7.6 and 50 mM NaCl (referred to hereafteras wash buffer). Beads were suspended in wash buffer by gentleagitation, then were briefly centrifuged, and the supernatant washbuffer was removed and discarded. A second wash was performed in thesame way, except that once the beads were suspended, they were incubatedat 45° C. for 30 minutes while the tube was turned end-over-end (thiswas to promote dissociation of any DNA molecules that may have annealednon-specifically to the biotinylated capture oligos). The beads wereagain centrifuged briefly, and the supernatant wash buffer was removed.The captured primer-extended products were eluted from the surface ofthe washed beads by heat-denaturation. Since the biotin-streptavidininteraction was not substantially disrupted by heating at 95° C., onlythe captured primer-extended products were eluted from the beads,whereas the biotinylated capture oligos remained bound to the beads.Elution was carried out directly into the pre-amplification PCR cocktailas described below.

Multiplexed Pre-Amplification PCR

The purified primer-extension products were eluted directly into acocktail of buffer, nucleotides, and primers that was used to carry outthe multiplexed pre-amplification reaction. The primer-extended DNA waseluted into the following cocktail:

Molecular grade water (enough to make 100 μL total reaction volume) 10xAccuprime Taq PCR buffer #1 or pfx 10 μL buffer (with dNTPs alreadyadded) Forward primer mix for 40 amplicons 40 μL (20 uM Fwd mix stock,200 nM final each) Universal reverse primer - ExtV2Rev 10 μL (2 uMstock, 200 nM final) Total 100 μLThe beads in the pre-amplification cocktail were heated at 95° C. for 30seconds, were quickly and gently centrifuged, and the supernatant wastransferred to a clean PCR tube. When the cocktail reached roomtemperature, 2 μL of Accuprime hotstart Taq polymerase (or 1 μLAccuprime Pfx) was added to the tube, and mixed by pipetting up anddown. Then 30 μL of mineral oil was added to prevent evaporation duringthermal cycling which was carried out as follows:a. 94° C. for 2 minutes (95° C. if using Accuprime Pfx)b. 94° C. for 20 seconds (95° C. if using Accuprime Pfx)c. 63° C. for 30 secondsd. 72° C. for 20 secondse. repeat (b) to (d) for a total of 15 cyclesf. 72° C. for 2 minutes

Then, 11 μL of 100 mM EDTA was added (10 mM final concentration) to thecompleted reaction to chelate divalent cations and thus terminatepolymerase activity.

The forward primers used in this pre-amplification reaction weredesigned to hybridize to regions on the target sequences that werenested relative to the binding sites of the biotinylated captureoligonucleotides that were used in the first primer extension reaction.This nested design provided an additional level of specificity so thatthe desired target DNAs would be preferentially amplified. The sequencesof the universal pre-amplification reverse primer (ExtV2Rev), and the 40different nested forward primers are listed in Table 9.

TABLE 9 List of 40 forward primers and the single universal reverseprimer (ExtV2Rev) used for the pre-amplification reaction SEQ ID NameDNA Sequence NO: ExtV2REV CGAGACGGATCAAGCAGAAGACG 173 ExF-TP53-1GCCAACTGGCCAAGACCTGC 174 ExF-TP53-2 CTCCAGCCCCAGCTGCTCAC 175 ExF-TP53-3GTCCCCAGGCCTCTGATTCCTC 176 ExF-TP53-4 CCTCCCAGAGACCCCAGTTGC 177ExF-TP53-5 TGGGCCTGTGTTATCTCCTAGGTTG 178 ExF-TP53-6 GCAGCTCGTGGTGAGGCTCC179 ExF-PIK3CA-1 AGAAACAGAGAATCTCCATTTTAGCACTTACC 180 ExF-PIK3CA-2TTCAATGCATGCTGTTTAATTGTGTGGAAG 181 ExF-BRAF TCCACAAAATGGATCCAGACAACTGTTC182 ExF-FoxL2 GGCGCCGGTAGTTGCCCTTC 183 ExF-GNASGGAAGTTGACTTTGTCCACCTGGAAC 184 ExF-CTNNB1 GGAGGTATCCACATCCTCTTCCTCAG 185ExF-PPP2R1A-1 CGACTCCCAGGTACTTCCGGAAC 186 ExF-PPP2R1A-2TGTCAACGGTCGCTCATCTACCTC 187 ExF-PTEN-1 CCATAGAAATCTAGGGCCTCTTGTGC 188ExF-PTEN-2 CACCACACACAGGTAACGGCTG 189 ExF-KIT-1GAAATATCCTCCTTACTCATGGTCGGATCA 190 ExF-KIT-2CCCATTTGTGATCATAAGGAAGTTGTGTTG 191 ExF-KIT-3 GTGGGCTCTGGGAATCCTGCTG 192ExF-EGFR-1 CCACACAGCAAAGCAGAAACTCAC 193 ExF-EGFR-2ACCTCCTTACTTTGCCTCCTTCTGC 194 ExF-EGFR-3 GTGTTCCCGGACATAGTCCAGGAG 195ExF-AKT1 GCCATCATTCTTGAGGAGGAAGTAGC 196 ExF-ATMAGACAGCTGGTGAAAAATCCCTGAAC 197 ExF-APC TGCTTAGGTCCACTCTCTCTCTTTTCAG 198ExF-FGFR3-1 GCGGTGGCCCTCCTTCAGCAG 199 ExF-FGFR3-2 CCAGCACCGCCGTCTGGTTG200 ExF-FGFR3-3 CCACCAGGATGAACAGGAAGAAGC 201 ExF-MET-1CAGTGCTCATGATTGGGTCCGT 202 ExF-MET-2 GCCATCCACTTCACTGGCAGC 203ExF-STK11-1 CTTCACCTTGCCGTAAGAGCCTTC 204 ExF-STK11-2CTCACCGGGCACCGTGAAGTC 205 ExF-KRAS-1CATTATTTTTATTATAAGGCCTGCTGAAAATGACTGA 206 ExF-KRAS-2TGGCAAATACACAAAGAAAGCCCTCC 207 ExF-NRAS-1 CAATTAACCCTGATTACTGGTTTCCAACAG208 ExF-NRAS-2 GGCAAATACACAGAGGAAGCCTTCG 209 ExF-HRAS-1CAGGAGACCCTGTAGGAGGACC 210 ExF-HRAS-2 TGATGGCAAACACACACAGGAAGC 211ExF-KRAS- AGGATTCTTTATGTTTCTCTTCAGGCAACTG 212 Cntrl ExF-BRAF-CntrlACTGTTGTTCTCTTTGTTGGAGTGCATC 213

Purification of the Products of the Pre-Amplification Reaction

The products of the pre-amplification reaction were purified using aQIAquick® PCR purification kit (Qiagen) according to the manufacturer'sinstructions. This removed the enzyme, dNTPs, and unincorporated primersfrom the double-stranded reaction products. Elution of the DNA from thecolumn was carried out in 60 μL of EB buffer (composed of 10 mM Tris).This elution volume allowed 1 μL to be used in each of the 40 individualPCRs (see next section), with approximately 20 μL left over in case anyfailed reactions need to be repeated. The purified DNA can be stored at4° C. for several days if necessary. Extra care was taken when handlingany of the amplified products to avoid contamination of these productsinto the reagents used for reaction set-up (separate work-spaces weremaintained for reagents and for amplification products).

Separate PCR Amplification of Individual Gene Targets (MutationHotspots)

After purification, products of the pre-amplification reaction weresubjected to further amplification by PCR in separate tubes (one tubefor each of the 40 target gene regions). These individual PCRs wereperformed in order to provide an additional layer of amplificationspecificity, since the multiplexed pre-amplification reaction was likelyto have produced many spurious products in addition to the amplicons ofinterest. Using PCR primers that were nested relative to the primersused in the previous pre-amplification step allowed the desired targetDNAs to be preferentially amplified. Also, by carrying out eachindividual PCR to saturation and using the same concentration of primersin each reaction, similar numbers of copies of each target region couldbe produced. Normalization of molecular counts in this way allowed asimilar sequencing depth to be achieved for each target.

A different gene-specific forward primer was paired with a universalreverse primer in each of the 40 PCR tubes. Both primers were nestedrelative to the primers used in the pre-amplification reaction so thatfurther amplification specificity could be achieved (a nested primer isdesigned so that its 3′-end hybridizes to a region within the desiredtarget sequence that was flanked by the primers used in the earlierround of amplification). The forward primers contained extra sequenceson their 5′-ends that were necessary for subsequent sequencing on anIllumina flow cell. The reverse primer was also designed to produce aproduct that was compatible with the Illumina sequencer without the needfor attachment of additional adapter sequences. The sequences of theuniversal reverse PCR primer (called IntV2Rev) and the 40 different,target-specific forward PCR primers are listed in Table 10. A 4nucleotide stretch of degenerate sequence was included in the forwardprimer to provide greater sequence diversity at the first few readpositions, thereby improving cluster discrimination on the Illuminasequencer. Although these primers were designed to be compatible withthe Illumina next-generation sequencing system, the method canrelatively easily be adapted to other sequencing platforms. The PCRsetup of each individual tube was as follows:

Molecular grade water 4.8 μL 10x Accuprime Taq Buffer #1 1 μL Forwardgene-specific primer (1 uM stock, 200 nM 2 μL final) Universal reverseprimer IntV2Rev (2 uM stock, 1 μL 200 nM final) Template DNA purifiedafter pre-amplification 1 μL reaction Accuprime Taq DNA polymerase 0.2μL Total 10 μL

TABLE 10List of 40 nested forward primers and the single nested universal reverse primer(IntV2Rev) IntV2 CAAGCAGAAGACGGCATACGAGA 217 REV InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 218 TP53-CTNNNNACTTGCAGCTGTGGGTTGATTCCAC 1 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 219 TP53-CTNNNNACTCCAGCTGCTCACCATCGCTATCT 2 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 220 TP53-CTNNNNACTTCCTCACTGATTGCTCTTAGGTCTGG 3 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 221 TP53-CTNNNNACTCAAACCAGACCTCAGGCGGCTC 4 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 222 TP53-CTNNNNACTGCTCTGACTGTACCACCATCCAC 5 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 223 TP53-CTNNNNACTTCCCCTTTCTTGCGGAGATTCTCT 6 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 224 PIK3CA-CTNNNNACTCACTTACCTGTGACTCCATAGAAAATCTTTC 1 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 225 PIK3CA-CTNNNNACTGAAGATCCAATCCATTTTTGTTGTCCAGC 2 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 226 BRAFCTNNNNACTCAAACTGATGGGACCCACTCCATC InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 227 FoxL2CTNNNNACTCGGTAGTTGCCCTTCTCGAACATG InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 228 GNASCTNNNNACTACTTGGTCTCAAAGATTCCAGAAGTCAG InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 229 CTNNB1CTNNNNACTCCTCAGGATTGCCTTTACCACTCAG InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 230 PPP2R1A-CTNNNNACTCCGGAACCTGTGCTCAGATGACAC 1 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 231 PPP2R1A-CTNNNNACTCTGTGAACTTGTCAGCCACCATGTAG 2 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 232 PTEN-CTNNNNACTCCTTTAAAAATTTGCCCCGATGTAATAAATATGC 1 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 233 PTEN-CTNNNNACTGGCTGAGGGAACTCAAAGTACATGAAC 2 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 234 KIT-1CTNNNNACTGGATCACAAAGATTTGTGATTTTGGTCTAGC InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 235 KIT-2CTNNNNACTGGTCTATGTAAACATAATTGTTTCCATTTATCT InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 236 KIT-3CTNNNNACTGCCACACATTGGAGCATGCCA InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 237 EGFR4CTNNNNACTGAAACTCACATCGAGGATTTCCTTGTTG InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 238 EGFR-CTNNNNACTTGCATGGTATTCTTTCTCTTCCGCAC 2 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 238 EGFR-CTNNNNACTGAGGCAGCCGAAGGGCATGAG 3 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 240 AKT1CTNNNNACTGTGGCCGCCAGGTCTTGATG InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 241 ATMCTNNNNACTCATGTGTAGAAAGCAGATTTCTCCATGATTC IrIF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 242 APCCTNNNNACTTTCAGCAGTAGGTGCTTTATTTTTAGGTAC InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 243 FGFR34CTNNNNACTCAGCTTGAAGAGCTCCTCCACAG InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 244 FGFR3-CTNNNNACTCCTGCAGGATGGGCCGGTG 2 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 245 FGFR3-CTNNNNACTCCACCCCGTAGCTGAGGATGC 3 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 246 MET-CTNNNNACTGCTGGAGACATCTCACATTGTTTTTGTTG 1 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 247 MET-CTNNNNACTGCTTTGCACCTGTTTTGTTGTGTACAC 2 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 248 STK114CTNNNNACTCCCATCAGGTACTTGCCGATGAG InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 249 STK11-CTNNNNACTCTGAGTGTAGATGATGTCATCCTCGATG 2 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 250 KRAS4CTNNNNACTGCTGAAAATGACTGAATATAAACTTGTGGTA InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 251 KRAS-2CTNNNNACTCAGTCCTCATGTACTGGTCCCTCATT InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 252 NRAS4CTNNNNACTTCTTGCTGGTGTGAAATGACTGAGTAC InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 253 NRAS-2CTNNNNACTTCGCCTGTCCTCATGTATTGGTCT InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 254 HRAS-1CTNNNNACTCCTGAGGAGCGATGACGGAATATAAG InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 255 HRAS-2CTNNNNACTATGTACTGGTCCCGCATGGCG InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 256 KRAS-CTNNNNACTAGGACTTTTAGAATTCTTAAATGTCATCCGC Cntrl 257 InF-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT BRAF-CTNNNNACTTCTAGCTTGCTGGTGTATTCTTCATAGG Cntrl Mineral oil (20 μL) was added to each tube to prevent evaporation duringPCR. Again, both Accuprime Taq polymerase as well as Accuprime Pfx weretested for PCR amplification, and both worked. The temperature cyclingconditions used for PCR were as follows:

a. 94° C. for 2 minutes (95° C. if using Accuprime Pfx)

b. 94° C. for 20 seconds (95° C. if using Accuprime Pfx)

c. 64° C. for 30 seconds

d. 72° C. for 20 seconds

e. repeat b to d for a total of 36 to 45 cycles (36 cycles for Taq and45 cycles for Pfx).

f. 72° C. for 2 minutes

g. 4° C. until removed from thermal cycler

While the PCR tubes were still at 4° C., 5 μL of 30 mM EDTA was added toinactivate the polymerase (10 mM EDTA final). This was added under themineral oil layer, and was pipetted up and down to mix. Products fromall 40 reactions were pooled into a single tube (equal volumes from eachof the 40 reactions were added to the final mix).

Preparation of DNA for Next-Generation Sequencing

The pooled PCR reaction products were purified on a 2% agarose gel withethidium bromide and 1×TBE buffer. Since all PCR products were of asimilar final length, the pooled products appeared on the gel as asomewhat diffuse band. This diffuse band was excised from the gel usinga fresh scalpel blade, ensuring that the gel was cut a few millimetersabove and below the visible band to include any low-intensity bands thatmay have run faster or slower and were not well-visualized. Using aQIAquick® Gel Extraction kit (Qiagen) according to the manufacturer'sinstructions, the DNA was isolated from the gel slice. The DNA waseluted into 50 μL of elution buffer, EB.

Next-Generation Sequencing

To prepare the sample for loading onto an Illumina HiSeq flow cell, theconcentration of the DNA was measured using an Agilent Bioanalyzer®, andthe DNA was diluted to the concentration recommended by Illumina. Inorder to increase sequence diversity on the flow cell, Phi-X control DNA(Illumina) was added so that the total molar amount of Phi-X DNA wasapproximately 30% of the final sample that was loaded onto the flowcell.

Cluster formation was carried out on the flow cell according toIllumina's protocol. The sample was loaded onto a single lane of a flowcell. The sequencing was performed on a HiSeq® 2000 instrument inmultiplexed paired-end mode, with a read length of 75 base pairs in eachdirection. An index read was also performed, and the length of the indexread was increased from the standard 7 cycles up to 13 cycles so thatour longer custom barcodes and MLT sequences could be appropriatelyread. A control lane was designated that contained either phi-X DNA orgenomic DNA so that matrix generation for phasing/prephasing would bebased on a sample having greater sequence diversity than was present inour sample. Demultiplexing of the sequences was performed using customcomputer code.

Outline of Algorithm for Sequence Analysis

The sequences that survived the filtering process were comprised of thePCR amplicons of interest as well as sequences derived from controlPhi-X DNA. Our algorithm effectively ignored Phi-X sequences becausethose sequences did not conform to the filtering requirements describedbelow.

A computer algorithm was designed to sort, align, and count the millionsof sequences that were generated by the high-throughput sequencer. Thesequence elements used in the algorithm are identified in FIG. 14. Thefollowing steps provide an outline of the process and rationale used toanalyze the sequence data:

1. Only clonal sequences that had passed Illumina's chastity filter wereincluded in the analysis. Any sequences that had a “.” at any positionwere eliminated (these were counted as dot rejects). If an unusuallylarge number of dots were found at a particular sequence position(indicating sequencer failure at that cycle), the filter was modified inorder to avoid filtering out an unreasonably large fraction ofsequences.2. The 8 nucleotide barcode from the index read (read #2) was used toassign each filtered clonal sequence to a sample-specific bin. Thesequence in the region of the barcode was expected to be in the format,BBBBBBBBNNAT, where “B” was a barcode nucleotide and “N” was anucleotide belonging to the molecular lineage tag (a position designatedas N had an approximately equal probability of being A, C, G, or T inany given molecule. In order for the clonal sequence to be assigned to asample-specific bin, the following conditions had to be satisfied:

-   -   a. the sequence BBBBBBBB at positions 1-8 had to exactly match        the reverse complement of one of the 96 barcodes listed in Table        6;    -   AND    -   b. the nucleotides at position 11 and 12 of the index read had        to be AT. If a clonal sequence failed to satisfy both above        conditions, it was classified as a barcode reject. In case the        lack of sequence diversity at these positions 11 and 12 caused        the read quality to be greatly diminished, leading to a high        rate of miscalls or “.” calls, requirement (b) was optionally        modified or eliminated.        3. Each clonal sequence that was assigned to a sample-specific        bin was further sub-classified according to the targeted gene        segment from which it arose. The primer sequences from both the        forward and reverse reads were used to assign each clone to a        particular gene segment. In the present example, 40 distinct        gene segments were analyzed. In order to assign a clonal        sequence to a gene target bin, the conditions (a), (b), and (c)        had to be satisfied.    -   a. In the forward read, the first 8 nucleotides of the primer        sequence (designated by a “F” in FIG. 14) had to exactly match        the first 8 nucleotides of one of the 40 forward gene-specific        primer sequences.    -   b. In the reverse read, the first 8 nucleotides of the primer        sequence (designated by a “R” in FIG. 14) had to exactly match        the first 8 nucleotides of one of the 40 reverse gene-specific        primer sequences.    -   c. The forward primer and reverse primer reads had to lead to        assignment of each clone to the same gene segment. Assignment of        a single clonal sequence to more than one gene segment bin was        not permissible.        If a clonal sequence failed to satisfy these conditions, it was        classified as a gene segment reject.        4. Each clonal sequence that was successfully assigned to a        sample-specific barcode bin and to a gene segment bin then had        its forward and reverse reads aligned to each other using a        Smith-Waterman algorithm, as described in Example 1 (the        reverse-complement of read #3 was derived to facilitate        alignment). This enabled identification of the region of overlap        between the forward and reverse reads. Different lengths of        overlap were expected for different gene segments since the        forward and reverse read-lengths were constant but the PCR        amplicon length differed for different gene segments. The length        of overlap could also vary because of the presence of insertion        or deletion mutations. The forward and reverse reads were also        aligned to the wild-type reference sequence for its assigned        gene segment (the full-length wild-type reference sequences are        listed in Table 11).

TABLE 11List of wild-type reference sequences for all 40 targeted gene segmentsGene Segment Reference Sequence (FFFF....FFFF[xxxx...xxxx]RRRR...RRRR)TP53-1 TGCAGCTGTGGGTTGATTCCAC[acccccgcccggcacccgcgtccgcgccatggcca 258tcta]CAAGCAGTCACAGCACAG TP53-2CCAGCTGCTCACCATCGCTATCT[gagcagcgctcatggtgggggcagcgcctcac] 259AACCTCCGTCATGTGCTA TP53-3TCCTCACTGATTGCTCTTAGGTCTGG[cccctcctcagcatcttatccgagtggaagg 260a]AATTTGCGTGTGGAGTATTTGT TP53-4CAAACCAGACCTCAGGCGGCTC[atagggcaccaccacactatgtcgaa]AAGTG 261TTTCTGTCATCCAAATAT TP53-5GCTCTGACTGTACCACCATCCAC[tacaactacatgtgtaacagttcctgcatgggcggc 262atgaaccggaggc]CCATCCTCACCATCATCACAG TP53-6TCCCCTTTCTTGCGGAGATTCTCT[tcctctgtgcgccggtctcctcccaggacaggcac 263aaacacgcac]CTCAAAGCTGTTCCGTCCCAG PIK3CA-CACTTACCTGTGACTCCATAGAAAATCTTTC[tcctgctcagtgatttcagag]A 264 1GAGGATCTCGTGTAGAAATTGCA PIK3CA-GAAGATCCAATCCATTTTTGTTGTCCAGC[caccatgatgtgcatcat]TCATT 265 2TGTTTCATGAAATACTCCAAAGA BRAFCAAACTGATGGGACCCACTCCATC[gagatttcactgtagc]TAGACCAAAA 266TCACCTATTTTTACTGTT FoxL2CGGTAGTTGCCCTTCTCGAACATG[tcttcgcaggccgg]GTCCAGCGTCC 267 AGTAGTTGCA GNASACTTGGTCTCAAAGATTCCAGAAGTCAG[gacacggcagcga]AGCAGGT 268 CCTGAAACAAAATTGAGCTNNB1 CCTCAGGATTGCCTTTACCACTCAG[agaaggagctgtggtagtggcaccagaatgg 269attccagagtcc]AGGTAAGACTGTTGCTGCCAG PPP2R1A-CCGGAACCTGTGCTCAGATGACAC[ccccatggtgcggcgggcc]GCAGCCT 270 1 CCAAGCTGGGTPPP2R1A- CTGTGAACTTGTCAGCCACCATGTAG[cggacgcgccaggact]TGTCTTCA 271 2GCGGCCTGGCT PTEN- CCTTTAAAAATTTGCCCCGATGTAATAAATATGC[acatatcattacaccagtt272 1 cgtcc]CTTTCCAGCTTTACAGTGAATTGCC PTEN-GGCTGAGGGAACTCAAAGTACATGAAC[ttgtcttcccgtcgtgtgg]GTCCTG 273 2AATTGGAGGAATATATCTTCAT KIT-1GGATCACAAAGATTTGTGATTTTGGTCTAGC[cagagacatcaagaat]GAT 274TCTAATTATGTGGTTAAAGGAAACGC KIT-2GGTCTATGTAAACATAATTGTTTCCATTTATCT[cctcaacaaccttccactgta 275cttcatacatgggttt]CTGTGGGGAGAAAGGGAAAAC KIT-3GCCACACATTGGAGCATGCCA[ttcacgagcctgtcgtaagtcag]GATTTCTGG 276TTTTGCTACAGGAAC EGFR-GAAACTCACATCGAGGATTTCCTTGTTG[gctttcggagatgttgcttctcttaattcc 277 1ttgatagc]GACGGGAATTTTAACTTTCTCACCG EGFR-TGCATGGTATTCTTTCTCTTCCGCAC[ccagcagtttggccagcc]CAAAATC 278 2TGTGATCTTGACATGCTT EGFR-GAGGCAGCCGAAGGGCATGAG[ctgcgtgatg]AGCTGCACGGTGGAGG 279 3 TGAG AKT1GTGGCCGCCAGGTCTTGATG[tactcccct]ACAGACGTGCGGGTGGTC 280 ATMCATGTGTAGAAAGCAGATTTCTCCATGATTC[atttgtatcttgg]AGTAAA 281ATATCATGAATCAAGTATGGAAGA APCTTCAGCAGTAGGTGCTTTATTTTTAGGTAC[ttctcgcttg]GTTTGAGCT 282 GTTTGAGGAGGTCFGFR3- CAGCTTGAAGAGCTCCTCCACAG[ggatgccggggtacggggagcccc]CCAGC 283 1GTGAAGATCTCCCAT FGFR3-CCTGCAGGATGGGCCGGTG[cggggagcgctctgtggg]GGCAGATGACGCT 284 2 CAGGGA FGFR3-CCACCCCGTAGCTGAGGATGC[ctgcatacacactgcccgcc]TCGTCAGCCT 285 3 CCACCAGCGMET-1 GCTGGAGACATCTCACATTGTTTTTGTTG[acgatcttgttgaaga]AGTCGT 286TGACATATTTGATAGGGAAC MET-2GCTTTGCACCTGTTTTGTTGTGTACAC[tatagtattctttatcataca]TGTCTCT 287GGCAAGACCAAAATT STK11-CCCATCAGGTACTTGCCGATGAG[cttggcccgcttgcggcgcggctggtaga]TG 288 1ACCTCGGTGGAGTCGATA STK11-CTGAGTGTAGATGATGTCATCCTCGATG[tcgaagaggtcctcgtcctcgtccgcg 289 2c]CGTGCAGGTCCTCCAAGTAT KRAS-GCTGAAAATGACTGAATATAAACTTGTGGTA[gttggagctggtggcgt]AG 290 1GCAAGAGTGCCTTGACGAC KRAS-CAGTCCTCATGTACTGGTCCCTCATT[gcactgtactcctcttg]ACCTGCTGT 291 2GTCGAGAATATCG NRAS-TCTTGCTGGTGTGAAATGACTGAGTAC[aaactggtggtggttggagcaggtggtg 292 1t]TGGGAAAAGCGCACTGAT NRAS-TCGCCTGTCCTCATGTATTGGTCT[ctcatggcactgtactcttcttg]TCCAGCTG 293 2TATCCAGTATGTCA HRAS-CCTGAGGAGCGATGACGGAATATAAG[ctggtggtggtgggcgccggcggtgt]G 294 1GGCAAGAGTGCGCTGACCAC HRAS-ATGTACTGGTCCCGCATGGCG[ctgtactcctcctg]GCCGGCGGTATCCAG 295 2 GATGA KRAS-AGGACTTTTAGAATTCTTAAATGTCATCCGC[at]AGGTGTTTTGTCA 296 CntrlATATTATAAACAGGAT BRAF-TCTAGCTTGCTGGTGTATTCTTCATAGG[cctataaaataaagcagacttatat]TC 297 CntrlAATCCGGACTTTGTCCTGAT Note: Forward and reverse primer sequences are incapital letters, to the left and right of the square brackets,respectively. The actual reverse primer sequence would be thereverse-complement of that shown above. The genomic wild-type amplicontarget sequence is in lower case letters within the square brackets.5. Next, any variants or mutations that existed within the amplicontarget region for each gene segment were identified and quantified(nucleotides in this region were designated by a “X” in FIG. 14).Wild-type sequences of the amplicon target regions (region betweenflanking primers) for all 40 gene segments are listed in lower caseletters within square brackets in Table 11. All clones belonging to aparticular sample-specific bin and a particular gene segment bin werecompared to the wild-type sequence in the amplicon target region. If aclonal sequence had perfect agreement between its two overlapping readsin the amplicon target region, but deviated from the wild-type sequence,then that clone was identified and counted as a “consistent variant”. Ifa clonal sequence had perfect agreement between its two overlappingreads in the amplicon target region, and was perfectly consistent withthe wild-type sequence, then it was identified and counted as an “exactmatch to wild type”. If a clonal sequence had deviations from theamplicon target reference sequence seen in either or both of the forwardand reverse reads, but the two reads were not perfectly consistent witheach other, then that clone was identified and counted as an“inconsistent variant”. Any mismatches, insertions, or deletionsrelative to the reference sequence (whether found in both reads or in asingle read) were counted and tabulated for each position within theamplicon target region for all sequences in a given bin (For purposes ofillustration, results of a hypothetical experiment are shown in FIG.15).6. In order to distinguish mutant sequences that were present in theoriginal template DNA molecules from those arising due to sequencingerrors or errors introduced during PCR amplification or sampleprocessing, sequences called “molecular lineage tags” (MLTs) were used.As shown in FIG. 14, the sequence for MLT-1 was comprised of a total of8 degenerate nucleotide positions (derived by concatenating 6 positionsof MLT-la and 2 positions of MLT-1b). Each of the eight N positions hadan approximately equal likelihood of having an A, C, G, or T nucleotide,so that 4̂8=65,536 possible MLT sequences could be generated. Thus, aparticular primer molecule would be expected to have any one of the65,536 possible MLT sequences. Prior to amplification by PCR, the DNAtemplate molecules were copied by primer-extension, and a MLT-1 sequencebecame attached to each primer-extended copy. Thus, each template copywas tagged with one of 65,536 possible MLT-1 sequences.

To identify variants arising from mutant template DNA molecules, first alist of all “consistent variants” was generated. If a “consistentvariant” sequence was seen in more than one clone within a bin ofsequences, then the number of copies of such variants was counted. Thesevariants were listed along with the number of clonal copies (indescending order of frequency) as shown in FIG. 15. Then, for all clonesbelonging to a particular “consistent variant” sequence, a list of MLT-1sequences associated with the clones was generated (the actual list ofMLT-1 sequences was not displayed). Within each list, any MLT sequencethat was found to be associated with more than one clone was classifiedas a “multiply occurring MLT”. A histogram of such multiply occurringMLTs was generated for each variant (as shown in FIG. 15). The count ofdifferent MLT-1 sequences occurring “N” times for a given variant waslisted in a numerical table (where N was the number of copies of thesame MLT). An alternate way to present the MLT-1 counts was to list the“N” value and the number of different MLTs having that number of copies(e.g. N×Z, where Z is the number of different MLT sequences having Ncopies). For variant sequences arising from one or more mutant DNAtemplate molecules, there was a high probability of finding multiplyoccurring MLTs with a high “N” value (because the number of clonalsequences sampled post-amplification was several-fold greater than thenumber of template DNA molecules that were copied and tagged in theprimer-extension reaction). In contrast, for variant sequences arisingfrom errors introduced during amplification of wild-type templatemolecules, it was unlikely to find MLT-1 sequences with “N” values ashigh as those associated with true mutant templates. Since known mutanttemplate oligonucleotides had been spiked into each sample prior toamplification, these internal standards were used to determine the rangeof “N” values that should be expected for variant sequences derived fromunknown mutant templates. Values falling below that range were presumedto be associated with variants arising from errors of amplification orsequencing.

7. A “mutation authenticity score” (MAS) can be used to facilitate theidentification of variant sequences arising from mutant template DNAmolecules. The MLT copy numbers, “N”, that are associated with thespiked-in mutant internal control oligos (having mutations at twodistinct positions) can be evaluated (Table 7). The variant sequencesthat arise from these authentic mutant templates are associated withMLT-1 sequences having relatively high “N” values. The value “N_(auth)”is in one embodiment the mean “N” value for these known authentic mutanttemplates. The “N_(auth)” value can be weighted or unweighted. If amutation were introduced during the first cycle of PCR, the “N” value ofsuch a variant sequence would be approximately (1/1.7)×N_(auth) (if eachcycle of PCR yields approximately 1.7-fold amplification). Similarly, avariant sequence would have “N” values of approximately(1/1.7^(y))×N_(auth) (if a mutation were introduced during the y^(th)cycle of PCR). Thus, a mutation authenticity score is calculated foreach “consistent variant” sequence based on how close the “N” values ofits MLTs are to the “N_(auth)” values of the authentic mutants that arespiked into the reaction. A variant would be likely to be authentic ifits “N” values were distributed within a defined range of the N_(auth)values.

Results

A set of control plasma-derived DNA samples was tested. These samplescontained various ratios of normal plasma DNA spiked with known amountsof mutant oligonucleotides (listed in Table 7). It was consistentlyobserved that the PCR products were formed in a highly specific mannerfor all 40 gene segments included in the panel. The methods wereextensively tested using a real-time quantitative thermal cycler, andcomparisons to negative controls having no plasma DNA or having mouseDNA confirmed that the intended targets were being amplified. Theproducts of all 40 PCRs were run on an agarose gel, and the productionof appropriate-sized amplicons was confirmed.

Sequencing of the 40 pooled amplicons from multiple barcoded samples onthe Illumina HiSeq® 2000 platform further confirmed that all intendedgene segments were amplified. The total number of raw clonal sequencesyielded was 282,965,036. After filtering, the rejected sequences were asfollows:

Failed Illumina's chastity filter: 79,320,290 Positions 5-7 in forwardread were not “ACT”: 23,751,168 Rejected because of the presence of an Nin 27,477,576 position 8 or beyond: Failed to recognize forward primer:6,304,833 Barcode did not exactly match one in our set 15,921,744 of 96:Positions 11-12 of index read were not “AT”: 2,903,342 Failed torecognize reverse primer: 17,064,651 Remaining filtered reads:110,221,432

The total number of filtered counts assigned to each of the 40 genesegments is listed in Table 12. These data revealed a relatively evendistribution of counts across the various amplicons.

TABLE 12 Number of filtered sequence counts associated with eachtargeted gene segment. AKT1 1927236 APC 3263261 ATM 2988621 BRAF 2644671BRAF-Cntrl 2827920 CTNNB1 3387874 EGFR-1 2582553 EGFR-2 2670482 EGFR-31908549 FGFR3-1 2441848 FGFR3-2 1907661 FGFR3-3 2154173 FoxL2 2782971GNAS 2481154 HRAS-1 2173717 HRAS-2 2456244 KIT-1 1960170 KIT-2 4202032KIT-3 2739896 KRAS-1 5647782 KRAS-2 3421539 KRAS-Cntrl 3076757 MET-12923088 MET-2 2956664 NRAS-1 4037334 NRAS-2 2462906 PIK3CA-1 2124016PIK3CA-2 2807504 PPP2R1A-1 2087213 PPP2R1A-2 1966662 PTEN-1 3125952PTEN-2 2562118 STK11-1 2969119 STK11-2 2613351 TP53-1 2494793 TP53-22248788 TP53-3 3074251 TP53-4 2964973 TP53-5 2632869 TP53-6 2522720Total 110221432

The sequence data were processed using a modified version of thecomputer code that was used in Example 1. The results demonstrated thatcontrol double-mutant oligonucelotides that were spiked into plasma DNAcould be reliably detected and quantified. Requiring consistency ofoverlapping paired-end reads appeared to eliminate the vast majoritysequencer errors. Also, analysis of the MLT sequences associated with“consistent variants” made it possible to distinguish sequences arisingfrom authentic mutant templates from those introduced duringamplification or sequencing. An example of processed data for the BRAFgene target region for a sample in which approximately equal numbers ofcopies of normal plasma DNA and double-mutant control oligos were mixedis shown in FIG. 16. This output represented the analysis of data from asingle bin (single gene segment, single barcode). In this example, atotal count of 103,742 clonal sequences were assigned to the bin, and65,143 of these counts arose from amplification of the double-mutantoligonucleotide that was spiked in. The double-mutant sequencescomprised approximately half of the total sequences in this bin. The MLTcounts associated with each consistent variant were listed in the formatN×y where y was the number of unique MLT sequences that had N copiesassociated with that particular consistent variant. It was observed thatthe variant sequence arising from spiked-in control mutant templates wasassociated with several MLT-1 sequences having high copy numbers (N).The highest value of N for this variant was 3742, and there were manydistinct MLTs that had copy numbers in the thousands. In contrast, thenext most abundant variant had only 965 total counts and was associatedwith only a few distinct MLTs. The highest value of N for this variantwas 576, and only one other MLT had a count in the hundreds range. Allother consistent variants in the list were associated with very low MLTcopy numbers. Based on these observations, it could be confirmed thatonly the spiked-in control oligonucleotides produced variant sequencesthat had high MLT counts. MLT counts associated with variant sequencesthat likely arose from errors of PCR or sequencing were much lower, aspredicted.

Example 3

This example demonstrates the application of methods that incorporatedmethods of Example 2, and included modifications thereof. A modificationincluded elimination of separate PCRs for each target DNA in the finalstep. Instead, the final amplification was performed in a single tubeusing universal PCR primers. This also eliminated the requirement for apre-amplification step. Pooled amplification was made possible bycopying, tagging, and purifying the targeted DNA regions in a highlyselective manner; spurious templates that could be amplified byuniversal primers in the final PCR would be minimized (FIG. 1). In thisexample, the same 40 genomic target regions were analyzed as in Example2.

Methods Preparation of Mixtures of Primers Having Combinations ofModular Oligonucleotide Segments

Mixtures of primers having combinations of modular barcode segments andgene-specific segments were prepared as described in Example 2. Thepreferred approach, called “modular automated synthesis andpurification”, is schematized in FIG. 3, and is described in detail inExample 2.

Collection and Processing of Patient Plasma Samples

Blood was collected and processed as described in Example 2.

Extraction and Purification of DNA from Plasma

DNA was extracted from plasma as described in Example 2.

Round 1 PCR

In order to make a limited number of tagged copies (fewer than 20) ofthe plasma-derived template DNA molecules, a few cycles of PCR wereperformed (in contrast to primer extension that was performed in Example2). The reverse primers used in the first round of PCR were the modularbarcoded mixtures of gene-specific primers as described above (same asthe primers used in the primer extension reaction in Example 2). Forforward primers, the same oligonucleotides were used as the biotinylatedcapture oligonucleotides that had been used in Example 2 to purify theprimer-extension products. The sequences of the forward primers arelisted in Table 8.

Forty different gene regions were targeted, and therefore a combinationof 40 different biotinylated forward primers and 40 different modularbarcoded gene-specific reverse primers were used in the Round 1 PCR foreach sample. For a given sample, the mixture of gene-specific reverseprimers all had the same, sample-specific barcode in the 5′ segment. Theprimer mixes were produced so that an approximately equimolarconcentration of 40 different forward and 40 different reverse primerswould be present in the reaction (final concentration of approximately100 nM each primer). In addition to sample-specific barcodes, thereverse primers also contained degenerate sequence regions known asmolecular lineage tags (MLTs) as well as common sequences at the 5′-endthat allowed for hybridization of “universal” PCR primers in subsequentsteps. The MLT assigned to each copy in Round 1 PCR was referred to asMLT-1.

Control DNA molecules containing known mutations were spiked into eachRound 1 PCR to serve as internal quantitative standards. As described inExample 2, these DNA molecules were cartridge-purified oligonucleotidesthat were synthesized to contain variations from the wild-type sequenceat two distinct positions. These variations allowed the controlsequences to be readily distinguished from other variants within DNApurified from a clinical sample. The sequences of the top strands ofthese control DNA oligonucleotides are listed in Table 7. Bottom strandswere also synthesized corresponding to the reverse complements of these40 sequences. In order to make the control DNA as similar as possible tothe clinically-derived DNA, both strands were annealed to make themdouble-stranded before adding them to the primer-extension reaction. Thedouble-stranded DNA was quantified by UV spectrometry and then dilutedto the desired concentration. To each PCR, approximately 200 copies ofthe double-stranded control DNA fragments corresponding to each of the40 gene target sites were added.

The Round 1 PCR amplification consisted of the following components: (1)template DNA purified from plasma and eluted in 20 microliters of Qiagenelution buffer AVE, (2) 1×Phusion® buffer HF, (3) 200 mM of each dNTP(dATP, dCTP, dGTP, and dTTP), (4) mixture of 40 reverse barcodedprimers, 100 nM each, (5) mixture of 40 forward biotinylated primers,100 nM each, (6) 200 copies of double-stranded control DNA, (7)molecular grade water as needed to make the desired total volume, and(8) Phusion® Hot Start Flex DNA polymerase, (0.04 U/μL). The totalvolume of each reaction was 40 microliters (for each 20μ eluted plasmaDNA sample). A separate reaction was set up for each sample.

Thermal cycling was carried out on a BioRad iCycler® using the followingprotocol: (1) 98° C. for 45 seconds, (2) 98° C. for 10 seconds, (3) 70°C. for 30 seconds, (4) slowly cooling by 1° C. every 30 seconds down to56° C., (5) 55° C. for 2 minutes, (6) 72° C. for 1 minute, (7) repeatsteps 2 to 6 for 3 cycles total, and (8) hold temperature at 72° C.indefinitely.

As quickly as possible, while the reaction was still at 72° C., EDTA (10mM final concentration) was added to terminate the polymerase activity.Each tube was agitated gently to ensure even mixing of the EDTA. Sincethe PCR products now had sample-specific barcodes attached, the productsof all reactions could be pooled together into a single tube.

Purification of Round 1 PCR Products

Since the forward primers used in the Round 1 PCR contained biotin tagsat their 5′-ends, these tags were incorporated into the PCR products andwere used to purify the products. To capture the biotin-tagged PCRproducts, 10 μL of high capacity streptavidin-agarose bead slurry(Thermo-Fisher) was added (per 40 μL PCR). Thus, for example, if fiftyRound 1 PCRs were performed in a volume of 40 μL each, then the volumeafter combining all samples would be 2 mL, and 500 μL of bead slurrywould be used. Tubes were turned end-over-end constantly for at least 2hours at room temperature to promote binding of biotinylated DNA to thestreptavidin beads. Beads were then gently and briefly centrifuged atlow speed, and any unbound supernatant was carefully removed, avoidingaspiration of any beads. The beads were then washed in 200 μL of buffercontaining 10 mM Tris pH 7.6, 50 mM NaCl, and 1 mg/mL salmon sperm DNA(“wash buffer”). Beads were suspended in wash buffer by gentleagitation, were gently centrifuged, and then the supernatant wash bufferwas discarded. A second wash was performed in the same way, except thatthe suspended beads were incubated at 50° C. for 25 minutes followed by60° C. for 5 minutes while the tube was turned end-over-end to promotedissociation of any DNA molecules that may have annealednon-specifically to the biotinylated oligonucleotides. The beads wereagain centrifuged gently, and the supernatant wash buffer was removed.

Optionally, between the first and second washes, the beads were treatedwith Exonuclease I (New England Biolabs) in order to digest any singlestranded DNA (including un-extended biotinylated primer) that was boundto the beads. For the tested samples, it was found that this nucleasetreatment was not necessary following the first Round of PCR. Fordigestion, the beads were suspended in 1× Exonuclease I buffer (2 μL forevery 1 μL of beads), and then Exonuclease I enzyme was added to a finalconcentration of 0.5 μL. The reaction was incubated at 37° C. for 30minutes. The beads were then centrifuged, the supernatant was discarded,and the beads then were subjected to the second wash.

The captured PCR products were then eluted from the surface of thewashed beads by heat-denaturation. Elution was carried out by heatingthe beads to 95° C. for 30 seconds directly in Round 2 PCR cocktail (asdescribed below), gently centrifuging the beads, and harvesting theeluted DNA within the supernatant cocktail. Note that only one strand ofthe PCR product was eluted because the biotin-streptavidin interactionwas not substantially disrupted by heating at 95° C., and thus thebiotinylated strand would remain bound to the beads. Likewise, anyun-extended biotinylated oligonucleotides would also remain bound to thebeads.

Round 2 PCR

The second round of PCR was also performed for only a few cycles(between 2 and 4). This PCR provided additional selectivity by using amixture of 40 nested forward primers that would specifically hybridizeto the desired genomic target sequences. This step also provided asecond molecular lineage tag on the other side of the mutation-pronetarget sequence (opposite to the barcode and MLT-1). The forward primerscontained a stretch of degenerate positions, called “molecular lineagetag-2” (MLT-2), which was useful in determining which sequences hadbecome labeled with the wrong barcode due to sequence crossover duringpooled amplification. The forward primers also contained a commonsequence at their 5′-ends which served as a universal primer bindingsite in the third and final round of PCR. This common sequence alsoprovided some of the adapter sequences required for sequencing on theIllumina platform. The reverse primer used in Round 2 PCR had a biotintag at its 5′-end which was used for purification of the Round 2 PCRproducts.

The purified Round 1 PCR products were eluted directly into a cocktailthat was used for Round 2 PCR. For every 10 μL of bead slurry that wasused, 40 μL of PCR cocktail was used for elution. The Round 2 PCRcocktail consisted of the following components: (1) 1× Phusion® bufferHF, (2) 200 mM of each dNTP (dATP, dCTP, dGTP, and dTTP), (3) a mixtureof 40 nested forward primers, 100 nM each, (4) 10 ng/μL salmon spermDNA, and (5) molecular grade water as needed to make the desired totalvolume.

After elution of the single-stranded PCR product from the beads into theabove cocktail (and removal of the beads), a biotinylated universalreverse primer was added to achieve a final concentration of 200 nM.This biotinylated primer had to be added to the cocktail after removalof the streptavidin-agarose beads to prevent the biotin from binding tothe beads. Finally, Phusion® Hot Start Flex DNA polymerase was added tothe cocktail to a final concentration of 0.04 units per microliter, andwas mixed by gently pipetting the cocktail up and down. If the totalvolume was greater than recommended for a single PCR tube, then thecocktail was split into the appropriate number of identical reactionvolumes.

Thermal cycling was carried out on a BioRad iCycler® using the followingprotocol: (1) 98° C. for 45 seconds, (2) 98° C. for 10 seconds, (3) 70°C. for 30 seconds, (4) slowly cooling by 1° C. every 30 seconds down to61° C., (5) 60° C. for 2 minutes, (6) 72° C. for 1 minute, (7) repeatsteps 2 to 6 for 3 cycles total and, (8) hold temperature at 72° C.indefinitely.

As quickly as possible, while the reaction was still at 72° C., EDTA (10mM final concentration) was added to terminate the polymerase activity.The tube was agitated gently to ensure even mixing of the EDTA.

The sequences of the 40 nested forward primers were the same as thoseprovided in Table 10, except that the sequence “NNNNACT” in each primerwas replaced by “NNNNNN”. The common “ACT” sequence was removed becauseit led to poor sequence diversity which produced low-quality base-callson the Illumina sequencer. Instead, the stretch of degenerate positionswas increased from 4 to 6 bases to provide a greater number of sequencecombinations at MLT-2. The sequence of the biotinylated reverse primerused in Round 2 PCR (called BioV2rev) was as follows:5′-Biotin-CGAGACGGATCAAGCA GAAGACG-3′ (SEQ ID NO:214).

Purification of Round 2 PCR Products

The biotin tag at the 5′-end of the reverse primer used in Round 2 PCRwas used to capture and purify the products of Round 2 PCR. This stepremoved any un-extended forward primers, as well as many spuriousproducts that might have been produced during the amplification, whichprevented inappropriate incorporation of new MLTs during the next roundof amplification.

The capture, washing, digestion, and elution of the Round 2 PCR productswas performed in a manner that was essentially identical to the processdescribed above for the purification of Round 1 PCR products. In Round 2PCR purification, the Exonuclease I step was not optional. Thus, thebeads were washed once in wash buffer at room temperature, then weretreated with Exonuclease I, and then were washed a second time atelevated temperature (50° C. for 25 minutes followed by 60° C. for 5minutes) to remove non-specific DNA. Fewer beads were used for a givenvolume of Round 2 PCR reaction. Five microliters of bead slurry was usedfor every 40 μL of PCR reaction volume.

Elution of the captured PCR products was also performed in a manner thatwas essentially the same as that used for purification of the Round 1PCR products. The streptavidin-agarose beads were heated to 95° C. for30 seconds to elute the product directly into a cocktail that was usedfor Round 3 PCR (described below). The biotinylated strand of the PCRproduct remained bound to the beads, while the opposite strand waseluted into the Round 3 PCR cocktail.

Round 3 PCR

The third and final round of PCR amplified the DNA molecules that werespecifically tagged, copied, and purified in the first 2 rounds of PCR.To provide sufficient DNA for visualization by ethidium bromide stainingon an agarose gel, the amount of PCR product from Round 3 had to besubstantial (at least 0.5 microgram). Thus, the final PCR amplificationwas carried to saturation or beyond (typically 15 to 35 cycles,depending on the amount of template DNA in each sample and the totalnumber of samples that were pooled).

In contrast to the final PCRs in Example 2 which were performedseparately for each genomic target site, the final PCR in the presentExample was performed in a combined reaction volume for all genomictargets and for all samples. This extremely high level of multiplexingwas only possible because of the highly selective methods used foramplification and purification in the prior two rounds of PCR.

As described above, the round 2 PCR products were eluted directly intoRound 3 PCR cocktail. The volume of this cocktail depended on the volumeof beads used. For every 5 μL of bead slurry, 20 μL of PCR cocktail wasused. The Round 3 PCR cocktail consisted of the following components:(1) 1× Phusion® buffer HF, (2) 200 mM of each dNTP (dATP, dCTP, dGTP,and dTTP), (3) Universal forward and reverse primers, 200 nM each, (4)10 ng/μL salmon sperm DNA, and (5) molecular grade water as needed tomake the desired total volume.

After elution of the single-stranded PCR product from the beads into theabove cocktail, Phusion® Hot Start Flex DNA polymerase was added to thecocktail to a final concentration of 0.04 U/μL, and was mixed by gentlypipetting the cocktail up and down. If the total volume was greater thanrecommended for a single PCR tube, then the cocktail was split into theappropriate number of identical reaction volumes. Mineral oil (20 μL)was added to the tube(s) to prevent evaporation during PCR.

Thermal cycling was carried out on a BioRad® iCycler using the followingprotocol: (1) 98° C. for 45 seconds, (2) 98° C. for 10 seconds, (3) 62°C. for 30 seconds, (4) 72° C. for 20 seconds, (5) repeat steps 2 to 4for 35 cycles total, and (8) hold temperature at 4° C. indefinitely.

Soon after the reaction had reached 4° C., EDTA (10 mM finalconcentration) was added to terminate the polymerase activity. Since thePCR product was under mineral oil, a pipette with a filtered tip wasused to evenly mix the EDTA. Special care was taken to avoidcontamination of other reagents and workspaces with PCR products.

Preparation of DNA for Next-Generation Sequencing

The product of the Round 3 PCR was purified on a 2% agarose gel, asdescribed in Example 2. Since the products were not of a homogeneouslength, a somewhat diffuse band was seen on the gel. The band was cutwith a few mm margin above and below to ensure inclusion of anylow-intensity bands that may have been difficult to visualize. AQIAquick® Gel Extraction kit (Qiagen) was used to isolate the DNA fromthe gel slice. The DNA was eluted into 50 μL of EB buffer (supplied inthe kit).

Next-Generation Sequencing

Next generation sequencing was performed as described in Example 2,using the Illumina HiSeq® 2000 platform. In the present example, theIllumina MiSeq® instrument was also used with similar success forsamples requiring less sequence depth. In contrast to Example 2,addition of Phi-X DNA to improve sequence diversity was not necessary inthe present Example 15 because modification of the Round 2 PCR forwardprimers to remove the common “ACT” sequence and to lengthen MLT-2resulted in adequate sequence diversity.

Outline of Algorithm for Sequence Analysis

Essentially the same algorithm that was described in Example 2 wasapplied to the data generated in Example 3. Although many of theprocessing steps used in Example 3 differ from those used in Example 2,the structure of the final double stranded DNA products are virtuallyidentical. Thus, a very similar algorithm can be applied for sorting,aligning, and counting the resulting sequences. As noted above, theregion of MLT-2 which was “NNNNACT” in Example 2 was replaced with“NNNNNN” in Example 3, and this change was accounted for in the modifiedalgorithm.

To minimize the probability of mis-classifying a variant sequence asbelonging to the wrong sample, MLT-1 and MLT-2 sequences were used todistinguish sequences in which barcode “cross-over” may have occurredduring pooled amplification. Since a portion of MLT-1 is adjacent to thebarcode sequence, and MLT-2 is on the other side of the target region(FIG. 14), molecules that undergo such cross-over between the barcodeand the mutation-prone region would also undergo cross-over betweenMLT-1 and MLT-2 (or between the two separate regions of MLT-1). Such“crossed-over” sequences would be expected to have a low number ofcopies having a given combination of MLT-1 and MLT-2 sequences. Incontrast, sequences arising from an authentic mutant template thatremained attached to its originally assigned barcode would be expectedto have greater copies of a given MLT-1 and MLT-2 combination.

The algorithm in Example 3 was modified to facilitate evaluation of therelationship between MLT-1 and MLT-2 sequence counts for each“consistent variant” and also for the wild-type sequences. In order toreport these counts in a reasonably succinct format, it was necessary tobin MLT counts by powers of two. For example, an MLT-1 count of 13 wouldbe placed into bin 4 (because 2̂4 is the smallest power of 2 that isgreater than or equal to 13). Thus, a report of 4×5 meant that therewere five instances of counts in the range of 9 to 16. Similarly, areport of 3×6 meant that there were six instances of counts in the rangeof 5 to 8. For a given collection of MLT-1 counts, the associated MLT-2counts were reported in a similar format, to the right of the MLT-1counts and separated by colons. For example, 4×5:2×3:1×7 meant thatamong 5 sets of MLT-1 sequences occurring between 9 and 16 times, therewere 3 instances of MLT-2 sequences that occurred between 3 and 4 times,and 7 instances of MLT-2 sequences that occurred twice. Different MLT-1bins were separated by a space.

Results

Purified DNA that was obtained from 0.5 mL of plasma of healthyvolunteers was mixed with various amounts of the control mutantoligonucleotides listed in Table 7. Between 200 and 5,000 copies of eachof the 40 control oligonucleotides were added to each purified plasmaDNA sample. These mixtures were subjected to 3 rounds of PCR andpurification as described in the methods. The highly multiplexed Round 3PCR in which multiple gene targets from multiple samples were amplifiedin a single tube, resulted in the specific production of amplicons ofthe expected size. As shown in FIG. 17, a relatively broad band was seenmigrating at a size corresponding to between 200 and 300 base pairs on a2% non-denaturing agarose gel (approximately centered at 250 basepairs). The primers and target regions were of variable length, and theamplification products spanned a range of sizes. Extensive testing ofnegative controls confirmed that these products were specific and wereabsent when mouse DNA or no template DNA was substituted for humanplasma DNA in Round 1 PCR.

The gel-extracted PCR products were subjected to next-generationsequencing using an Illumina MiSeq® instrument. The total number of rawclonal paired-end sequences was 20,511,389. After application of thevarious filters described above, the remaining sequences numbered11,184,975. The 40 different gene target regions were fairly evenlyrepresented among the filtered sequences. The median sequence count forthe 40 gene-specific regions (all barcodes) was 166,867.

After processing of the sequence data using the computer algorithmdescribed above, control mutant oligonucleotides that were spiked intothe plasma DNA were identified and quantified. Importantly, they werereadily distinguished from the vast majority of errors introduced duringamplification, processing, or sequencing. As observed previously inExample 1 and Example 2, the sequence redundancy provided by the clonaloverlapped paired-end reads was able to virtually eliminatesequencer-generated errors in the mutation-prone sequence regions. The“consistent variants” were then analyzed for the distribution of theirassociated MLT sequences. As an example, the summary output for analysisof sequences belonging to a single barcode and target gene region KRAS-2(region surrounding codon 61 of the KRAS gene) is shown in FIG. 18. Inthis sample, approximately 200 copies of the mutant oligonucleotideswere spiked in (each having two distinct mutations relative to thewild-type sequence). The mismatches of the “consistent variants”relative to the reference wild-type sequence are displayed in the lowerportion of FIG. 18. This single data bin contained 13,315 totalsequences, of which 10,815 were exact matches to the wild-type sequenceand 1,767 were exact matches to the spiked-in mutant sequence (an exactmatch requires that the overlapping portions of the paired-reads agreewith each other). The spiked-in mutant sequences comprised approximately10% to 15% of the total DNA in the sample, which is in the expectedrange (there should be approximately 1,000 to 2,000 genome copies offragmented DNA in 0.5 mL of plasma). The counts of MLT-1 and MLT-2 arereported for each “consistent variant” according to the scheme describedabove. The MLT-1 counts associated with sequences arising from thespiked-in control mutant oligonucleotides were generally higher thanthose associated with other variant sequences, as expected. This made itpossible to distinguish many of the “consistent variants” arising frompolymerase misincorporations that might have otherwise been mistaken forsequences arising from true mutant template molecules. Analysis of MLT-2counts associated each group of MLT-1 counts provided insight into theefficiency of molecular tagging and copying at PCR Rounds 1 and 2. Italso helped to distinguishing variants assigned to the wrong sample dueto barcode cross-over during pooled amplification.

Example 4: Splint-Mediated Enzymatic Ligation of Modular OligonucleotideSegments

In this example, an alternative approach is described for the productionof mixtures of primers in which each mixture had a common 5′ barcodesegment and a variety of gene-specific 3′ segments. Enzymatic ligationwas used to concentrate modular oligonucleotide segments. Morespecifically, in each ligation, a uniquely barcoded 5′ oligonucleotidesegment was ligated to a uniform mixture of different gene-specific 3′segments. A DNA splint was used to faciliate the ligation.

Gene-specific oligonucleotides with a common sequence at the 5′-end (anda 5′-phosphate group added during oligonucleotide synthesis) were mixedin equimolar ratios. The uniform mixture was divided into separate tubesand was ligated to a uniquely barcoded oligonucleotide in each tubeusing a biotin-tagged DNA splint as illustrated in FIG. 13. Thesequences of the 5′-phosphorylated gene-specific oligonucleotides arelisted in Table 13.

TABLE 13List of chemically 5′-phosphorylated gene-specific oligonucleotides usedfor splint-mediated modular ligation. Name DNA Sequence Ph-TP53-1X-AGACGTGTGCTCTTCCGATCTGTGCTGTGACTGCTTG 298 Ph-TP53-2X-AGACGTGTGCTCTTCCGATCTAGCACATGACGGAGGTT 299 Ph-TP53-3X-AGACGTGTGCTCTTCCGATCTCAAATACTCCACACGCAAATT 300 Ph-TP53-4X-AGACGTGTGCTCTTCCGATCTATTTGGATGACAGAAACACTT 301 Ph-TP53-5X-AGACGTGTGCTCTTCCGATCTTGTGATGATGGTGAGGATGG 302 Ph-TP53-6X-AGACGTGTGCTCTTCCGATCTGGGACGGAACAGCTTTGAG 303 Ph-PIK3CA-1X-AGACGTGTGCTCTTCCGATCTGCAATTTCTACACGAGATCCTCT 304 Ph-PIK3CA-2X-AGACGTGTGCTCTTCCGATCTCTTTGGAGTATTTCATGAAACAAATGA 305 Ph-BRAFX-AGACGTGTGCTCTTCCGATCTACAGTAAAAATAGGTGATTTTGGTCTA 306 Ph-FoxL2X-AGACGTGTGCTCTTCCGATCTGCAACTACTGGACGCTGGAC 307 Ph-GNASX-AGACGTGTGCTCTTCCGATCTCAATTTTGTTTCAGGACCTGCT 308 Ph-CTNNB1X-AGACGTGTGCTCTTCCGATCTGGCAGCAACAGTCTTACCT 309 Ph-PPP2R1A-X-AGACGTGTGCTCTTCCGATCTCCCAGCTTGGAGGCTGC 310 1 Ph-PPP2R1A-X-AGACGTGTGCTCTTCCGATCTGCCAGGCCGCTGAAGACA 311 2 Ph-PTEN-1X-AGACGTGTGCTCTTCCGATCTGCAATTCACTGTAAAGCTGGAAAG 312 Ph-PTEN-2X-AGACGTGTGCTCTTCCGATCTGAAGATATATTCCTCCAATTCAGGAC 313 Ph-KIT-1 X- 314AGACGTGTGCTCTTCCGATCTCGTTTCCTTTAACCACATAATTAGAATC Ph-KIT-2X-AGACGTGTGCTCTTCCGATCTTTTCCCTTTCTCCCCACAG 315 Ph-KIT-3X-AGACGTGTGCTCTTCCGATCTTCCTGTAGCAAAACCAGAAATC 316 Ph-EGFR-1X-AGACGTGTGCTCTTCCGATCTGGTGAGAAAGTTAAAATTCCCGTC 317 Ph-EGFR-2X-AGACGTGTGCTCTTCCGATCTAGCATGTCAAGATCACAGATTTTG 318 Ph-EGFR-3X-AGACGTGTGCTCTTCCGATCTCACCTCCACCGTGCAGCT 319 Ph-AKT1X-AGACGTGTGCTCTTCCGATCTACCACCCGCACGTCTGT 320 Ph-ATM X- 321AGACGTGTGCTCTTCCGATCTCTTCCATACTTGATTCATGATATTTTACT Ph-APCX-AGACGTGTGCTCTTCCGATCTACCTCCTCAAACAGCTCAAAC 322 Ph-FGFR3-1X-AGACGTGTGCTCTTCCGATCTTGGGAGATCTTCACGCTGG 323 Ph-FGFR3-2X-AGACGTGTGCTCTTCCGATCTCCCTGAGCGTCATCTGCC 324 Ph-FGFR3-3X-AGACGTGTGCTCTTCCGATCTGCTGGTGGAGGCTGACGA 325 Ph-MET-1X-AGACGTGTGCTCTTCCGATCTTCCCTATCAAATATGTCAACGACT 326 Ph-MET-2X-AGACGTGTGCTCTTCCGATCTATTTTGGTCTTGCCAGAGACA 327 Ph-STK11-1X-AGACGTGTGCTCTTCCGATCTATCGACTCCACCGAGGTCA 328 Ph-STK11-2X-AGACGTGTGCTCTTCCGATCTACTTGGAGGACCTGCACG 329 Ph-KRAS-1X-AGACGTGTGCTCTTCCGATCTCGTCAAGGCACTCTTGCCT 330 Ph-KRAS-2X-AGACGTGTGCTCTTCCGATCTGATATTCTCGACACAGCAGGT 331 Ph-NRAS-1X-AGACGTGTGCTCTTCCGATCTTCAGTGCGCTTTTCCCA 332 Ph-NRAS-2X-AGACGTGTGCTCTTCCGATCTGACATACTGGATACAGCTGGA 333 Ph-HRAS-1X-AGACGTGTGCTCTTCCGATCTGGTCAGCGCACTCTTGCCC 334 Ph-HRAS-2X-AGACGTGTGCTCTTCCGATCTCATCCTGGATACCGCCGGC 335 Ph-KRAS-CX-AGACGTGTGCTCTTCCGATCTCCTGTTTATAATATTGACAAAACACCT 336 Ph-BRAF-CX-AGACGTGTGCTCTTCCGATCTCAGGACAAAGTCCGGATTGA 337 X = 5′-posphate addedchemically during oligonucleotide synthesis.

The sequences of the 96 different barcoded oligonucleotides containedthe following common sequence, with each oligonucleotide containing adifferent 8-nucleotide barcode from the list in Table 6 inserted intothe position marked [BC1-96]:

(SEQ ID NO: 215) 5′-CGAGACGGATCAAGCAGAAGACGGCATACGAGAT[BC1-96]NNNNNNGTGACTGGAGTTC-3′The sequence of the 3′-biotin tagged splint oligonucleotide was:

(SEQ ID NO: 216) 5′-ATCGGAAGAGCACACGTCTGAACTCCAGTCACAAAAAAAAAAAAATCTCGTATGCCGTCTTCTGCTTGATCCG TCTCG-3′-Biotin-TEG 

The barcoded oligonucleotides were cartridge purified to ensure thatthey were mostly full-length. They were synthesized at the 40 nmolescale, with an expected full-length yield of approximately 50 to 60%.The phosphorylated gene-specific oligonucleotides and splintoligonucleotide were purified on a polyacrylamide gel (as described inSambrook J J, et al., Molecular Cloning: A Laboratory Manual, ColdSpring Harbor Press, 2001).

The ligation reactions were carried out using the following conditions:

Molecular grade water 38.15 μL Dithiothreitol (100 mM stock) 2.5 μL NEBLigase buffer (10x) 10 μL 5′-Phosphorylated oligo mix (263 uM stock) 2.7μL (7 uM final) splint oligo (248 uM stock) (14 uM final) 5.65 μLBarcoded oligo (50 uM stock) (20 uM final) 40 μL Total 100 μL

The 5′-phosphorylated oligonucleotide mix consisted of an equimolarmixture of 40 different gene-specific oligos. 96 different reactionswere set up in separate tubes, each one with a different barcodedoligonucleotide. To anneal the oligonucleotides to the splint, thereaction mixes were heated on a thermal cycler as follows: 95° C. for 30sec, 70° C. for 20 sec, then the temperature was decreased by 2.5° C.every 20 sec until the samples reached 25° C.

Then 2 μL of T4 DNA Ligase (400,000 U/mL, New England Biolabs) was addedto each reaction, and after mixing, the reactions were incubated at 25°C. for at least 2 hours.

Then 20 μL of streptavidin-agarose high-capacity bead slurry (ThermoScientific, Pierce) was added, and the samples were incubated at roomtemperature while being turned end-over-end on a rotisserie for at least2 hrs.

The streptavidin-agarose beads were then washed three times with 200microliters of Tris 10 mM pH 7.6, NaCl 50 mM. The ligated (andunligated) DNA molecules were then eluted from the beads byheat-denaturation of the DNA duplex. The majority of the biotinylatedsplint oligo remained attached to the beads because thebiotin-steptavidin interaction was not significantly disrupted byheating at 95° C. The elution was carried out in 2 steps. In the firststep, the beads were heated to 95° C. in 40 microliters of the Tris/NaClbuffer for 30 seconds. The beads were quickly spun down by briefcentrifugation, and then the supernatant containing was removed andstored. In the second step, the same elution process was carried out,but with heating to 95° C. for 45 seconds in order to remove anyremaining ligated DNA from the beads. The supernatants containing theligated (and unligated) DNA from the first and second elution steps werecombined into a total volume of 80 microliters. This process yieldedapproximately 600 to 700 picomoles of ligated oligonucleotides in 80microliters of buffer, for a final concentration of approximately 7-8micromolar.

1.-15. (canceled)
 16. A method of identifying a DNA sequence error produced by a sequencing instrument, the method comprising: a) generating a plurality of clonal DNA clusters, wherein a clonal DNA cluster comprises a plurality of top strand DNA copies and a plurality of bottom strand DNA copies, wherein all copies are derived from a single template DNA molecule; b) using the sequencing instrument to read a sequence from only the top strand DNA copies or only the bottom strand DNA copies within the clonal DNA cluster, thereby producing a first clonal sequence read; c) using the sequencing instrument to separately read, within the same clonal DNA cluster, a sequence from DNA strands that are in the opposite orientation to those used to produce the first clonal sequence read, thereby producing a second clonal sequence read; d) aligning the first clonal sequence read and the second clonal sequence read based on sequence complementarity in a region of expected sequence overlap; and e) identifying the DNA sequence error at a position that lacks perfect sequence complementarity in the region of expected sequence overlap between the first clonal sequence read and the second clonal sequence read.
 17. The method of claim 16, wherein the first clonal sequence read and the second clonal sequence read overlap completely.
 18. The method of claim 16, wherein the first clonal sequence read and the second clonal sequence read overlap partially.
 19. The method of claim 16, wherein the lack of perfect sequence complementarity can occur due to mismatches at a plurality of nucleotide positions, a mismatch at a single nucleotide position, a deletion, or an insertion.
 20. The method of claim 16, wherein improvements in sensitivity and accuracy of quantification of low-abundance sequence variants are enabled by identifying the DNA sequence errors.
 21. The method of claim 20, wherein the low-abundance sequence variants comprise mutant tumor-derived DNA fragments extracted from blood.
 22. A method of synthesizing modular oligonucleotide primer mixes, the method comprising: a) synthesizing a plurality of 3′ oligonucleotide segments comprising a plurality of target-specific primer sequences, wherein each target-specific primer sequence is synthesized in a separate synthesis column on solid supports; b) pausing the synthesis; c) pooling and thoroughly mixing all solid supports from all synthesis columns containing the partially-synthesized 3′ oligonucleotide segments; d) dispensing the pooled mixture of partially-synthesized 3′ oligonucleotide segments into a plurality of new synthesis columns; e) resuming synthesis to add a 5′ oligonucleotide segment comprising a unique sample-specific barcode to each new synthesis column; and f) cleaving and deprotecting the oligonucleotides from the solid supports in each column, yielding a plurality of modular oligonucleotide mixes, wherein each mix contains a unique, sample-specific barcode and a plurality of target-specific primer sequences.
 23. The method of claim 22, further comprising incorporation of a molecular lineage tag in the 5′ oligonucleotide segment.
 24. The method of claim 22, wherein the modular oligonucleotide primer mixes enable early assignment of sample-specific barcodes to a plurality of target sequences from a plurality of samples.
 25. The method of claim 24, wherein the target sequences comprise DNA.
 26. The method of claim 24, wherein the target sequences comprise RNA.
 27. The method of claim 24, wherein the plurality of samples comprise clinical specimens.
 28. The method of claim 27, wherein the clinical specimens are from different clinical patients.
 29. The method of claim 27, wherein the clinical specimens are from different clinical time points. 